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METHYL-D-ERYTHRITOL PHOSPHATE PATHWAY GENES 

This application claims the benefit under 35 U.S.C. § 1 19(e) of U.S. Provisional 
Patent Application Serial No. 60/223,483 filed August 7, 2000. 

9 

5 The present invention is in the field of plant genetics and biochemistry. More 

specifically, the invention relates to genes associated with the methyl-D-erythritol phosphate 
(MEP) pathway. The present invention provides and includes nucleic acid molecules, 
proteins, and antibodies associated with the genes of the MEP pathway and also provides 
methods utilizing such agents, for example in gene isolation, gene analysis and the 

10 production of transgenic plants. Moreover, the present invention includes transgenic plants 
modified to express proteins associated with the MEP pathway and methods for the 
production of products from the MEP pathway. 

Tocopherols are an important component of mammalian diets. Epidemiological 
evidence indicates that tocopherol supplementation can result in decreased risk for 

1 5 cardiovascular disease and cancer, can aid in immune function, and is associated with 
prevention or retardation of a number of degenerative disease processes in humans. 

r 

Tocopherols function, in part, by stabilizing the lipid bilayer of biological membranes, 
reducing polyunsaturated fatty acid (PUFA) free radicals generated by lipid oxidation, and 
scavenging oxygen free radicals, lipid peroxy radicals and singlet oxygen species. 

20 a-Tocopherol, often referred to as vitamin E, belongs to a class of lipid-soluble 

antioxidants that includes a, (3, y, and 8-tocopherols and a, P, y, and 5-tocotrienols. 
Although a, (3, y, and 5-tocopherols and a, P, y, and 8-tocotrienols are sometimes referred to 
collectively as "vitamin E", vitamin E is more appropriately defined chemically as a- 
tocopherol. a-Tocopherol is significant for human health, in part because it is readily 

25 absorbed and retained by the body, and therefore has a higher degree of bioactivity than other 
tocopherol species. However, other tocopherols such as P, y, and 5-tocopherols, also have 
significant health and nutritional benefits. 

Tocopherols are primarily synthesized only by plants and certain other 
photosynthetic organisms, including cyanobacteria. As a result, mammalian dietary 

30 tocopherols are obtained almost exclusively from these sources. Plant tissues vary 

considerably in total tocopherol content and tocopherol composition, with ct-tocopherol the 
predominant tocopherol species found in green, photosynthetic plant tissues. Leaf tissue can 
contain from 10-50 jag of total tocopherols per gram fresh weight, but most of the world's 
major staple crops (e.g., rice, corn, wheat, potato) produce low to extremely low levels of 
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total tocopherols, of which only a small percentage is oe-tocopherol. Oil seed crops generally 
contain much higher levels of total tocopherols, but a-tocopherol is present only as a minor 
component in most oilseeds. 

The recommended human daily dietary intake of 15-30 mg of vitamin E is quite 
5 difficult to achieve from the average American diet. For example, it would take over 750 
grams of spinach leaves in which a-tocopherol comprises 60% of total tocopherols, or 200- 
400 grams of soybean oil to satisfy this recommended daily vitamin E intake. While it is 
possible to augment the diet with supplements, most of these supplements contain primarily 
synthetic vitamin E, having eight stereoisomers, whereas natural vitamin E is predominantly 

10 composed of only a single isomer. Furthermore, supplements tend to be relatively expensive, 
and the general population is disinclined to take vitamin supplements on a regular basis. 

In addition to the health benefits of tocopherols, increased a-tocopherol levels in 
crops have been associated with enhanced stability and extended shelf life of fresh and 
processed plant products. Further, tocopherol supplementation of swine, beef, and poultry 

1 5 feeds has been shown to significantly increase meat quality and extend the shelf life of post- 
processed meat products by retarding post-processing lipid oxidation, which contributes to 
undesirable flavor components. 

Tocopherols are a member of the class of compounds referred to as the isoprenoids. 
Other isoprenoids include carotenoids, gibberellins, terpenes, chlorophyll and abscisic acid. 

20 The chloroplasts of higher plants exhibit interconnected biochemical pathways leading to 

secondary metabolites including tocopherols. One tocopherol biosynthetic pathway in higher 
plants involves condensation of homogentisic acid and phytylpyrophosphate to form 2- 
methyl-6 phytylplastoquinol. 

This plant tocopherol pathway can be divided into four parts: 1) synthesis of 

25 homogentisic acid, which contributes to the aromatic ring of tocopherol; 2) synthesis of 

phytylpyrophosphate, which contributes to the side chain of tocopherol; 3) joining of HGA 
and phytylpyrophosphate via a prenyltransferase followed by a subsequent cyclization; 4) 
and S~adenosyl methionine-dependent methylation of an aromatic ring, which affects the 
relative abundance of each of the tocopherol species. 

30 Homogentisic acid (HGA) is the common precursor to both tocopherols and 

plastoquinones. In at least some bacteria the synthesis of HGA is reported to occur via the 
conversion of chorismate to prephenate and then to p-hydroxyphenylpyruvate via a 
bifunctional prephenate dehydrogenase. Examples of bifunctional bacterial prephenate 
dehydrogenase enzymes include the proteins encoded by the tyrA genes of Erwinia herbicola 

35 and Escherichia coll The tyrA gene product catalyzes the production of prephenate from 
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chorismate, as well as the subsequent dehydrogenation of prephenate to form p- 
hydroxyphenylpyruvate (p-HPP), the immediate precursor to HGA. p-HPP is then converted 
to HGA by hydroxyphenylpyruvate dioxygenase (HPPD). In contrast, plants are believed to 
lack prephenate dehydrogenase activity, and it is generally believed that the synthesis in 
5 plants of HGA from chorismate occurs via the synthesis and conversion of the intermediate 
arogenate. Because pathways involved in HGA synthesis are also responsible for tyrosine 
formation, any alterations in these pathways can also result in the alteration in tyrosine 
synthesis and the synthesis of other aromatic amino acids. 

HGA is then combined with either phytyl-pyrophosphate or solanyl-pyrophosphate 
10 by phytyl/prenyl transferase to form methyl-plastoquinols, which are precursors to 

plastoquinones and tocopherols. The major structural difference between each of the 
tocopherol species is the position of the methyl groups around the phenyl ring. This 
methylation process is S-adenosyl methionine-dependent. Methyl Transferase 1 (MT1) 

catalyzes the formation of plastoquinol-9 and y- tocopherol by methylation of the 7 position. 

15 Subsequent methylation at the 5 position of y- tocopherol by y- tocopherol methyl-trans ferase 
generates the biologically active a-tocopherol. 

Phytylpyrophosphate, which is the central constituent of the tocopherol side chain, is 
formed from geranylgeranyldiphosphate (GGDP). GGDP is itself produced via a 
biosynthetic pathway in which isopentenyl diphosphate (DPP) plays a major role. IPP is a 

20 central intermediate in the production of isoprenoids. Two pathways that generate IPP have 
been reported: a cytoplasmic-based pathway referred to as the mevalonate pathway; and a 
plastid-based pathway referred to as the MEP pathway. The cytoplasmic-based pathway 
involves the enzymes acetoacetyl CoA thiolase, HMGCoA synthase, HMGCoA reductase, 
mevalonate kinase, phosphomevalonate kinase, and mevalonate pyrophosphate 

25 decarboxylase. 

Evidence for the existence of an alternative, plastid-based, isoprenoid biosynthetic 
pathway recently emerged from studies in the research groups of Rohmer and Arigoni, who 
found that the isotope labeling patterns observed in studies on certain eubacterial and plant 
terpenoids could not be explained in terms of the mevalonate pathway. Eisenreich et al. , 

30 Chem. Bio. 5:R221-233 (1998); Rohmer, Prog. Drug. Res. 50:135-154 (1998); Rohmer, 2 
Comprehensive Natural Products Chemistry 45-68, Barton and Nakanishi (eds.), Pergamon 
Press, Oxford, England (1999). Arigoni and coworkers subsequently showed that 1- 
deoxyxylulose, or a derivative thereof, serves as an intermediate of the novel pathway, now 
referred to as the MEP pathway. Rohmer et al, Biochem. J. 295:517-524 (1993); Schwarz, 

35 Ph.D. thesis, Eidgenossiche Technische Hochschule, Zurich, Switzerland (1994). 
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In the first step of the MEP pathway, DXP synthase, an enzyme encoded by the dxs 
gene, catalyzes the formation of l-deoxy-D-xylulose-5-phosphate (DXP) from one molecule 
each of D-glyceraldehyde-3 -phosphate and pyruvate. DXP is then converted into 2-C- 
methyl-D-erythritol-4-phosphate (MEP) by DXP reductoisomerase, which is encoded by the 
5 dxr gene. The conversion of MEP into 4-diphosphoc3^tidyl-2-C-methyl-D~erythritol (CDP- 
ME) is catalyzed by CDP-ME synthase, which is encoded by the ygbP gene. CDP-ME 
kinase, which is encoded by the ychB gene, catalyzes the conversion of CDP-ME into 4- 
diphosphocytidyl-2-C-methyl-D-erythritol 2-phosphate (CDP-MEP). CDP-MEP is then 
converted into 2-C~methyl-D-erythritol-2,4-cyclodiphosphate by ME-CDP synthase, which is 
10 encoded by theygbB gene. The ygbP and ygbB genes are tightly linked on the E. coll 
genome. Herz et al, PNAS 97 (6):2485-2490 (2000). 

Identification of further genes included in the MEP pathway will provide new 
approaches to increasing tocopherol levels in plants, which is a topic of the present 
application. 

15 

SUMMARY OF THE INVENTION 
The present invention provides a novel gene essential to the MEP pathway: gcpE. 
gcpE is tightly linked to ygbP and ygbB. Expression of GCPE (protein) in organisms such as 
plants can increase the levels of tocopherol substrates such as isopentyl diphosphate (IPP) 
20 and dimethylallyl diphosphate (DMAPP) biosynthesis. The present invention also provides 
transgenic organisms expressing a GCPE protein, which can nutritionally enhance food and 
feed sources. 

In particular, the present invention includes and provides a substantially purified 
nucleic acid molecule that encodes a protein comprising an amino acid sequence selected 

25 from the group consisting of SEQ ID NOs: 4 and 48 through 50. The present invention also 
includes and provides a substantially purified nucleic acid molecule that encodes a protein 
comprising an amino acid sequence of SEQ ID NO: 4. Further provided by the present 
invention is a substantially purified nucleic acid molecule that encodes a protein comprising 
an amino acid sequence of SEQ ID NO: 48. 

30 The present invention includes and provides a substantially purified nucleic acid 

molecule that encodes a protein comprising an amino acid sequence of SEQ ID NO: 49. The 
present invention also includes and provides a substantially purified nucleic acid molecule 
that encodes a protein comprising an amino acid sequence of SEQ ID NO: 50. Further 
provided by the present invention is a substantially purified nucleic acid molecule that 

35 encodes a GCPE protein, where the nucleic acid molecule comprises a nucleic acid sequence 



4 



wo 



02/12478 



PCT7US0 1/24335 



selected from the group consisting of SEQ ID NOs: 1 through 3, 5 through 47, and 
complements thereof. 

The present invention includes and provides a recombinant nucleic acid molecule 
comprising as operably linked components: (A) a promoter; and (B) a heterologous nucleic 
5 acid molecule that encodes an amino sequence selected from the group consisting of SEQ ID 
NOs: 4 and 48 through 50. The present invention also includes and provides transformed 

* 

cells comprising such nucleic acid molecules. 

Further provided by the present invention is a transgenic plant comprising a 
recombinant nucleic acid molecule comprising as operably linked components: (A) a 
10 promoter; and (B) a heterologous nucleic acid molecule that encodes an amino sequence 
selected from the group consisting of SEQ ID NOs: 4 and 48 through 50. 

The present invention includes and provides such a transgenic plant that exhibits an 
increased tocopherol level relative to a plant with a similar genetic background but lacking 
the recombinant nucleic acid molecule. Also provided are seeds derived from such 

i 

15 transgenic plants, and oil derived from such seeds. The present invention includes and 

provides such a transgenic plant that exhibits an increased monoterpene level relative to a 
plant with a similar genetic background but lacking the recombinant nucleic acid molecule. 
The present invention includes and provides such a transgenic plant that exhibits an increased 
carotenoid level relative to a plant with a similar genetic background but lacking the 

20 recombinant nucleic acid molecule. The present invention includes and provides such a 

transgenic plant that exhibits an increased tocotrienol level relative to a plant with a similar 
genetic background but lacking the recombinant nucleic acid molecule. 

The present invention includes and provides such a transgenic plant that produces a 
seed with an increased tocopherol level relative to a plant with a similar genetic background 

25 but lacking the recombinant nucleic acid molecule. The present invention includes and 

provides such a transgenic plant that produces a seed with an increased monoterpene level 
relative to a plant with a similar genetic background but lacking the recombinant nucleic acid 
molecule. The present invention includes and provides such a transgenic plant that produces 
a seed with an increased carotenoid level relative to a plant with a similar genetic background 

30 but lacking the recombinant nucleic acid molecule. The present invention includes and 

provides such a transgenic plant which produces a seed with an increased tocotrienol level 
relative to a plant with a similar genetic background but lacking the recombinant nucleic acid 
molecule. 

The present invention includes and provides a recombinant nucleic acid molecule 
35 comprising as operably linked components: (A) an exogenous promoter; and (B) a nucleic 
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acid sequence selected from the group consisting of SEQ ID NOs: 1 through 3, 5 through 47, 
and complements thereof. The present invention also includes and provides transformed 
cells comprising such nucleic acid molecules. 

Further provided by the present invention is a transgenic plant comprising a 
5 recombinant nucleic acid molecule comprising as operably linked components: (A) an 

exogenous promoter; and (B) a nucleic acid sequence selected from the group consisting of 
SEQ ID NOs: 1 through 3, 5 through 47 , and complements thereof. The present invention 
includes and provides such a transgenic plant which is selected from the group consisting of 
Brassica campestris, Brassica napus, canola, castor bean, coconut, cotton, crambe, linseed, 

1 0 maize, mustard, oil palm, peanut, rapeseed, rice, safflower, sesame, soybean, sunflower, and 
wheat. The present invention includes and provides such a trangenic plant which is selected 
from the group consisting of coconut, crambe, maize, oil palm, peanut, rapeseed, safflower, 
sesame, soybean, and sunflower. 

The present invention further includes and provides a seed derived from such a 

1 5 transgenic plant. Also provided are oil and meal derived from such seeds. The present 
invention includes and provides such a seed which exhibits an increased tocopherol level 
relative to seed from a plant having a similar genetic background but lacking the recombinant 
nucleic acid molecule. The present invention includes and provides such a seed which 
exhibits an increased oc-tocopherol level relative to seed from a plant having a similar genetic 

20 background but lacking the recombinant nucleic acid molecule. The present invention 

includes and provides such a seed which exhibits an increased monoterpene level relative to 
seed from a plant having a similar genetic background but lacking the recombinant nucleic 
acid molecule. The present invention includes and provides such a seed which exhibits an 
increased carotenoid level relative to seed from a plant having a similar genetic background 

25 but lacking the recombinant nucleic acid molecule. The present invention includes and 
provides such a seed which exhibits an increased tocotrienol level relative to seed from a 
plant having a similar genetic background but lacking the recombinant nucleic acid molecule. 

The present invention includes and provides a recombinant nucleic acid molecule 
comprising as operably linked components: (A) a promoter that functions in a plant cell to 

30 cause production of an mRNA molecule; and (B) a nucleic acid sequence that hybridizes 
under moderate stringency conditions to a nucleic acid sequence selected from the group 
consisting of SEQ ID NOs: 1 through 3, 5 through 47, and complements thereof. 

The present invention includes and provides a recombinant nucleic acid molecule 
comprising as operably linked components: (A) a promoter that functions in a plant cell to 

35 cause production of an mRNA molecule; and (B) a nucleic acid sequence that has greater 
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than 85% identity to a nucleic acid sequence selected from the group consisting of SEQ ID 
NOs: 1 through 3, 5 through 47, and complements thereof. 

The present invention includes and provides a substantially purified protein 
comprising an amino acid sequence selected from the group consisting of SEQ ID NOs: 4, 
5 48, and 49. The present invention also includes and provides an antibody capable of 

specifically binding a protein comprising an amino acid sequence selected from the group 
consisting of SEQ ID NOs: 4, 48 and 49. 

The present invention includes and provides a transgenic plant comprising a nucleic 
acid molecule that encodes a GCPE protein, where the nucleic acid molecule comprises a 

10 promoter operably linked to a heterologous nucleic acid sequence selected from the group 

consisting of SEQ ID NOs: 1 through 3, 5 through 47, and complements thereof. The present 
invention includes and provides such a transgenic plant where the the promoter is a seed- 
specific promoter. The present invention includes and provides such a transgenic plant 
where the seed-specific promoter is selected from the group consisting of napin, phaseolin, 

15 zein, soybean trypsin inhibitor, ACP, stearoyl-ACP desaturase, soybean a' subunit of b- 
conglycinin (soy 7s), and oleosin promoters. 

The present invention includes and provides such a transgenic plant, where the plant 
exhibits an increased isoprenoid compound level relative to a plant with a similar genetic 
background but lacking the heterologous nucleic acid sequence. The present invention 

20 includes and provides such a transgenic plant, where the isoprenoid compound is selected 
from the group consisting of tocotrienols, tocopherols, terpenes, gibberellins, carotenoids, 
and xanthophylls. The present invention includes and provides such a transgenic plant, 
where the isoprenoid compound is a monoterpene. The present invention includes and 
provides such a transgenic plant, where the isoprenoid compound is selected from the group 

25 consisting of IPP and DMAPP. The present invention includes and provides such a 

transgenic plant, where the plant exhibits an increased tocopherol level relative to a plant 
with a similar genetic background but lacking the heterologous nucleic acid sequence. Also 
included and provided are feedstock, plant parts, and seeds derived from such plants. Further 
provided are containers of such seeds. 

30 The present invention includes and provides a method of producing a transgenic 

plant with an increased isoprenoid compound level comprising: (A) transforming the plant 
with a nucleic acid molecule to produce a transgenic plant, where the nucleic acid molecule 
comprises a nucleic acid sequence selected from the group consisting of SEQ ID NOs: 1 
through 3, 5 through 47, and complements thereof; and (B) growing the transgenic plant. 
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The present invention includes and provides a method of producing a transgenic 
plant having seed with an increased isoprenoid compound level comprising: (A) transforming 
the plant with a nucleic acid molecule to produce a transgenic plant, where the nucleic acid 
molecule encodes a protein with an amino acid sequence selected from the group consisting 
5 of SEQ ID NOs: 4 and 48-50; and (B) growing the transgenic plant. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 sets forth chemical compounds that were determined as non-GCPE reaction 
products. 

Figure 2 sets forth the diacetate of 2-methylbut-2-ene-l 3 4-diol. 
10 Figure 3 sets forth (E)-l-(4-hydroxy-3-methylbut-2-enyl) diphosphate. 

Figure 4 sets forth an alignment between proteins encoded by the gcpE gene from E. 
coli (SEQ ID NO: 78) and clone 135H1 from A thaliana (SEQ ID NO: 79). 

Figure 5 sets forth cloning of a truncated Arabidopsis cDNA to create pQE-AGH. 

DESCRIPTION OF THE NUCLEIC AND AMINO ACID SEQUENCES 



15 


SEQ ID NO: 


1 is an Arabidopsis thaliana nucleotide sequence of a gcpE gene. 




SEQ ID NO: 


2 is a rice nucleotide sequence of a gcpE gene. 




SEQ ID NO: 


3 is an E. coli nucleotide sequence of a gcpE gene. 




SEQ ID NO: 


4 is an amino acid sequence derived from a rice gcpE gene. 




SEQ ID NO: 


5 is a partial A. thaliana nucleotide sequence of a gcpE gene. 


20 


SEQ ID NO: 


6 is a partial soybean nucleotide sequence of a gcpE gene. 




SEQ ID NO: 


7 is a partial tomato nucleotide sequence of a gcpE gene. 




SEQ ID NO: 


8 is a partial Mesembryanthemun crystallinum nucleotide sequence of a 




gcpE gene. 






SEQ ID NO: 


9 is a partial rice nucleotide sequence of a gcpE gene. 


25 


SEQ ID NO: 


10 is a partial maize nucleotide sequence of a gcpE gene. 




SEQ ID NO: 


1 1 is a partial Loblolly pine nucleotide sequence of a gcpE gene. 




SEQ ID NO: 


12 is a partial Physcomitrella patens nucleotide sequence of a gcpE 




gene. 




* 


SEQ ID NOs: 13 through 20 are partial A. thaliana nucleotide sequences of a gcpE 



30 gene. 



SEQ ID NOs: 21 through 32 are partial maize nucleotide sequences of a gcpE gene. 
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SEQ ID NOs: 33 through 46 are partial soybean nucleotide sequences of a gcpE 



gene. 



SEQ ID NO: 47 is a partial Brassica napus nucleotide sequence of a gcpE gene. 

SEQ ID NO: 48 is an amino acid sequence derived from an A. thaliana gcpE gene. 
5 SEQ ID NO: 49 is an amino acid sequence derived from a rice gcpE gene. 

SEQ ID NO: 50 is an amino acid sequence derived from an E. coli gcpE gene. 

SEQ ID NOs: 51 through 77 are primer nucleotide sequences. 

SEQ ID NO: 78 is an E. coli amino acid sequence derived from the gcpE gene. 

SEQ ID NO: 79 is an A. thaliana amino acid sequence derived from clone 135H1. 
1 0 SEQ ID NO: 80 is a partial A, thaliana nucleotide sequence of a gcpE gene. 

SEQ ID NO: 81 is an amino acid sequence derived from an A. thaliana gcpE gene. 

SEQ ID NO: 82 is a partial A. thaliana nucleotide sequence of a gcpE gene. 

SEQ ID NO: 83 is an amino acid sequence derived from an A. thaliana gcpE gene. 

SEQ ID NO: 84 is a partial A. thaliana nucleotide sequence of a gcpE gene. 
15 SEQ ID NO: 85 is an amino acid sequence derived from an A. thaliana gcpE gene. 

DEFINITIONS 

The following definitions are provided as an aid to understanding the detailed 
description of the present invention. 

The abbreviation "EP" refers to patent applications and patents published by the 
20 European Patent Office, and the term "WO" refers to patent applications published by the 

World Intellectual Property Organization. "PNAS" refers to Proc. Natl Acad. ScL (U.S.A.). 

"Amino acid" and "amino acids" refer to all naturally occurring L-amino acids. This 
definition is meant to include norleucine, norvaline, ornithine, homocysteine, and 
homoserine. 

25 "Chromosome walking" means a process of extending a genetic map by successive 

hybridization steps. 

The phrases "coding sequence/' "structural sequence," and "structural nucleic acid 
sequence" refer to a physical structure comprising an orderly arrangement of nucleic acids. 
The coding sequence, structural sequence, and structural nucleic acid sequence may be 
30 contained within a larger nucleic acid molecule, vector, or the like. In addition, the orderly 
arrangement of nucleic acids in these sequences may be depicted in the form of a sequence 
listing, figure, table, electronic medium, or the like. 

A nucleic acid molecule is said to be the "complement" of another nucleic acid 
molecule if they exhibit complete complementarity, i.e., every nucleotide of one of the 
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molecules is complementary to a nucleotide of the other. Two molecules are "minimally 
complementary" if they can hybridize to one another with sufficient stability to remain 
annealed to one another under at least conventional "low-stringency" conditions. Similarly, 
the molecules are "complementary" if they can hybridize to one another with sufficient 
5 stability to remain annealed to one another under conventional "high-stringency" conditions. 
Conventional stringency conditions are described by Sambrook et aL, Molecular Cloning: A 
Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, Cold Spring 
Harbor, N.Y. (1989); Haymes et aL, Nucleic Acid Hybridization, A Practical Approach, IRL 
Press, Washington, DC (1985). 

10 The phrases "DNA sequence," "nucleic acid sequence," and "nucleic acid molecule" 

refer to a physical structure comprising an orderly arrangement of nucleic acids. The DNA 
sequence or nucleic acid sequence may be contained within a larger nucleic acid molecule, 
vector, or the like. In addition, the orderly arrangement of nucleic acids in these sequences 
may be depicted in the form of a sequence listing, figure, table, electronic medium, or the 

15 like. "Nucleic acid" refers to deoxyribonucleic acid (DNA) and ribonucleic acid (RNA). 

An "elite soybean line" is any soybean line that has resulted from breeding and 
selection for superior agronomic performance. Elite soybean lines are commercially 
available to farmers or soybean breeders, e.g., HARTZ™ variety H4452 Roundup Ready™ 
(HARTZ SEED, Stuttgart, Arkansas, USA); QP4544 (Asgrow Seeds, Des Moines, Iowa, 

20 USA); DeKalb variety CX445 (DeKalb, Illinois). 

"Exogenous genetic material" is any genetic material, whether naturally occurring or 
otherwise, from any source that is capable of being inserted into any organism. 

The term "expression" refers to the transcription of a gene to produce the 
corresponding mRNA and translation of this mRNA to produce the corresponding gene 

25 product (i.e., a peptide, polypeptide, or protein). The term "expression of antisense RNA" 
refers to the transcription of a DNA to produce a first RNA molecule capable of hybridizing 
to a second RNA molecule. Formation of the RNA-RNA hybrid inhibits translation of the 
second RNA molecule to produce a gene product. 

"Fungi" as used herein includes the phyla Ascomycota, Basidiomycota, 

30 Chytridiomycota and Zygomycota, as well as the Oomycota and all mitosporic fungi, and 
"filamentous fungi" include all filamentous forms of the subdivision Eumycota and 
Oomycota. These terms are defined in Hawksworth et aL, in: Ainsworth and Bisby's 
Dictionary of The Fungi, 8 th edition, CAB International, University Press, Cambridge, UK 
(1995). 
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"Homology" refers to the level of similarity between two or more nucleic acid or 
amino acid sequences in terms of percent of positional identity {i.e., sequence similarity or 
identity). Homology also refers to the concept of similar functional properties among 
different nucleic acids or proteins. 
5 As used herein, a "homolog protein" molecule or fragment thereof is a counterpart 

protein molecule or fragment thereof in a second species (e.g., maize GCPE is a homolog of 
Arabidopsis GCPE). A homolog can also be generated by molecular evolution or DNA 
shuffling techniques, so that the molecule retains at least one functional or structure 
characteristic of the original protein (see, e.g., U.S. Patent No. 5,81 1,238). 

10 The phrase "heterologous" refers to the relationship between two or more nucleic 

acid or protein sequences that are derived from different sources. For example, a promoter is 
heterologous with respect to a coding sequence if such a combination is not normally found 
in nature. In addition, a particular sequence may be "heterologous" with respect to a cell or 
organism into which it is inserted (i.e. does not naturally occur in that particular cell or 

15 organism). 

"Hybridization" refers to the ability of a strand of nucleic acid to join with a 
complementary strand via base pairing. Hybridization occurs when complementary nucleic 
acid sequences in the two nucleic acid strands contact one another under appropriate 
conditions. 

20 The "MEP pathway" is the pathway associated with the biosynthesis of isopentenyl 

diphosphate or dimethylallyldiphosphate where deoxy~D-xylulose- 5 -phosphate or a 
derivative thereof serves as an intermediate. 

The phrase "operably linked" refers to the functional spatial arrangement of two or 
more nucleic acid regions or nucleic acid sequences. For example, a promoter region may be 
25 positioned relative to a nucleic acid sequence such that transcription of a nucleic acid 

sequence is directed by the promoter region. Thus, a promoter region is "operably linked" to 
the nucleic acid sequence. 

"Phenotype" refers to traits exhibited by an organism resulting from the interaction 
of genotype and environment, such as disease resistance, pest tolerance;, environmental 
30 tolerance such as tolerance to abiotic stress, male sterility, quality improvement or yield etc. 

"Polyadenylation signal" or "polyA signal" refers to a nucleic acid sequence located 
3' to a coding region that promotes the addition of adenylate nucleotides to the 3' end of the 
mRNA transcribed from the coding region. 

The term "promoter" or "promoter region" refers to a nucleic acid sequence, usually 
35 found upstream (5*) to a coding sequence, which is capable of directing transcription of a 
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nucleic acid sequence into mRNA. The promoter or promoter region typically provide a 
recognition site for RNA polymerase and the other factors necessary for proper initiation of 
transcription. As contemplated herein, a promoter or promoter region includes variations of 
promoters derived by inserting or deleting regulatory regions, subjecting the promoter to 
5 random or site-directed mutagenesis, etc. The activity or strength of a promoter may be 

measured in terms of the amounts of RNA it produces, or the amount of protein accumulation 
in a cell or tissue, relative to a promoter whose transcriptional activity has been previously 
assessed. 

The term "protein" or "peptide molecule" includes any molecule that comprises five 
10 or more amino acids. It is well known in the art that proteins may undergo modification, 
including post-translational modifications, such as, but not limited to, disulfide bond 
formation, glycosylation, phosphorylation, or oligomerization. Thus, as used herein, the term 
"protein" or "peptide molecule" includes any protein that is modified by any biological or 
non-biological process. 
15 A "protein fragment" is a peptide or polypeptide molecule whose amino acid 

sequence comprises a subset of the amino acid sequence of that protein. A protein or 
fragment thereof that comprises one or more additional peptide regions not derived from that 
protein is a "fusion" protein. 

"Recombinant vector" refers to any agent such as a plasmid, cosmid, virus, 
20 autonomously replicating sequence, phage, or linear single-stranded, circular single-stranded, 
linear double-stranded, or circular double-stranded DNA or RNA nucleotide sequence. The 
recombinant vector may be derived from any source and is capable of genomic integration or 
autonomous replication. 

"Regeneration" refers to the process of growing a plant from a plant cell or plant 
25 tissue (e.g., plant protoplast or explant). 

"Regulatory sequence" refers to a nucleotide sequence located upstream (5% within, 
or downstream (3*) to a coding sequence. Transcription and expression of the coding 
sequence is typically impacted by the presence or absence of the regulatory sequence. 

An antibody or peptide is said to "specifically bind" to a protein or peptide molecule 
30 of the invention if such binding is not competitively inhibited by the presence of non-related 
molecules. 

"Substantially homologous" refers to two sequences which are at least 90% identical 
in sequence, as measured by the BestFit program described herein (Version 10; Genetics 
Computer Group, Inc., University of Wisconsin Biotechnology Center, Madison, WI), using 
35 default parameters. 
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"Substantially purified" refers to a molecule separated from substantially all other 
molecules normally associated with it in its native state. More preferably a substantially 
purified molecule is the predominant species present in a preparation. A substantially 
.purified molecule may be greater than 60% free, preferably 75% free, more preferably 90% 
5 free, and most preferably 95% free from the other molecules (exclusive of solvent) present in 
the natural mixture. The term "substantially purified" is not intended to encompass 
molecules present in their native state. 

"Transcription" refers to the process of producing an RNA copy from a DNA 
template. "Transformation" refers to the introduction of nucleic acid into a recipient host. 
10 The term "host" refers to bacteria cells, fungi, animals or animal cells, plants or seeds, or any 
plant parts or tissues including plant cells, protoplasts, calli, roots, tubers, seeds, stems, 
leaves, seedlings, embryos, and pollen. 

"Transgenic" refers to organisms into which exogenous nucleic acid sequences are 
integrated. "Transgenic plant" refers to a plant where an introduced nucleic acid is stably 
15 introduced into a genome of the plant, for example, the nuclear or plastid genomes. 

"Vector" refers to a plasmid, cosmid, bacteriophage, or virus that carries exogenous 
DNA into a host organism. 

"Yeast" as used herein includes Ascosporogenous yeast (Endomycetales), 
Basidiosporogenous yeast and yeast belonging to the Fungi Imperfecti (Blastomycetes), as 
20 defined in Skinner et al. (1980). 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

One skilled in the art may refer to general reference texts for detailed descriptions of 
known techniques discussed herein or equivalent techniques. These texts include Ausubel et 
ah, Current Protocols in Molecular Biology, John Wiley and Sons, Inc. (1995); Sambrook et 

25 ai, Molecular Cloning, A Laboratory Manual (2d ed.), Cold Spring Harbor Press, Cold 

Spring Harbor, New York (1989); Birren et al, Genome Analysis: A Laboratory Manual, 
volumes 1 through 4, Cold Spring Harbor Press, Cold Spring Harbor, New York (1997- 
1999); Plant Molecular Biology: A Laboratory Manual, Clark (ed.), Springer, New York 
(1997); Richards et al., Plant Breeding Systems (2d ed.), Chapman & Hall, The University 

30 Press, Cambridge (1997); and Maliga et al, Methods in Plant Molecular Biology, Cold 

Spring Harbor Press, Cold Spring Harbor, New York (1995). These texts can, of course, also 
be referred to in making or using an aspect of the invention. 

Utilizing a methodology for the isolation and characterization of essential MEP 
pathway genes, an essential and novel gene, termed gcpE, was isolated. gcpE is tightly 
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linked to ygbP mdygbB, which are other MEP pathway genes. As an essential MEP 

pathway component, enhanced expression or overexpression of GCPE in a variety of 

organisms such as plants can result in higher levels of tocopherol precursors such as IPP and 

DMAPP and ultimately in enhanced levels of tocopherols in such organisms. Moreover, the 

5 present invention provides a number of agents, for example, nucleic acid molecules encoding 

» 

a GCPE protein, and provides uses of such agents. 

The agents of the invention will preferably be "biologically active" with respect to 
either a structural attribute, such as the capacity of a nucleic acid to hybridize to another 
nucleic acid molecule, or the ability of a protein to be bound by an antibody (or to compete 
10 with another molecule for such binding). Alternatively, such an attribute may be catalytic 
and thus involve the capacity of the agent to mediate a chemical reaction or response. The 
agents will preferably be substantially purified. The agents of the invention may also be 
recombinant. 

It is understood that any of the agents of the invention can be substantially purified 
1 5 and/or be biologically active and/or recombinant. It is also understood that the agents of the 
invention may be labeled with reagents that facilitate detection of the agent, e.g., fluorescent 
labels, chemical labels, modified bases, and the like. 

A. Nucleic Acid Molecules 

Agents of the invention include nucleic acid molecules. In a preferred aspect of the 
20 present invention the nucleic acid molecule comprises a nucleic acid sequence which encodes 
a GCPE protein. In a preferred embodiment, the GCPE protein is derived from an organism 
having a MEP pathway. Examples of GCPE proteins are those proteins having an amino 
acid sequence selected from the group consisting of SEQ ID NO: 4, 48, 49, or 50. 

In another preferred aspect of the present invention the nucleic acid molecule 
25 comprises a nucleic acid sequence that is selected from: (1) any of SEQ ID NOs: 1 through 3, 
5 through 47, complements thereof, or fragments of these sequences; (2) the group consisting 
of SEQ ID NOs: 1, 2, complements thereof, and fragments of these sequences; (3) the group 
consisting of SEQ ID NOs: 1, 2, 3, complements thereof and fragments of these sequences; 
(4) the group consisting of SEQ ID NOs: 1,2, 13 through 47, complements thereof and 
30 fragments of these sequences; (5) the group consisting of SEQ ID NOs: 5 through 12, 

complements thereof and fragments of these sequences; or (6) the group consisting of SEQ 
ID NOs: 1 through 3, 5 through 47, complements thereof and fragments of these sequences. 

In a further aspect of the present invention the nucleic acid molecule comprises a 
nucleic acid sequence encoding an amino acid sequence selected from: (1) any of SEQ ID 
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NOs: 4, 48, 49 or 50; (2) the group consisting of SEQ ID NO: 4, 48, and 49 and fragments of 
these sequences; or (3) the group consisting of SEQ ID NO: 4, 48, 49, 50 and fragments of 
these sequences. 

It is understood that in a further aspect of the nucleic acid sequences of the present 
5 invention can encode a protein which differs from any of the proteins in that amino acid have 
been deleted, substituted or added without altering the function. For example, it is 
understood that codons capable of coding for such conservative amino acid substitutions are 
known in the art. 

The present invention provides nucleic acid molecules that hybridize to the above- 

1 0 described nucleic acid molecules. Nucleic acid hybridization is a technique well known to 

those of skill in the art of DNA manipulation. The hybridization properties of a given pair of 
nucleic acids is an indication of their similarity or identity. 

The nucleic acid molecules preferably hybridize, under low, moderate, or high 
stringency conditions, with a nucleic acid sequence selected from: (1) any of SEQ ID NOs: 1 

15 through 3, 5 through 47, or complements thereof; (2) the group consisting of SEQ ID NOs: 1, 
2, and complements thereof; (3) the group consisting of SEQ ID NOs: 1, 2, 3, and 
complements thereof; (4) the group consisting of SEQ ID NOs: 1,2, 13 through 47, and 
complements thereof; (5) the group consisting of SEQ ID NOs: 5 through 12, and 
complements thereof; or (6) the group consisting of SEQ ID NOs: 1 through 3, 5 through 47, 

20 and complements thereof. Fragments of these sequences are also contemplated. 

The hybridization conditions typically involve nucleic acid hybridization in about 
0.1X to about 10X SSC (diluted from a 20X SSC stock solution containing 3 M sodium 
chloride and 0.3 M sodium citrate, pH 7.0 in distilled water), about 2.5X to about 5X 
Denhardt's solution (diluted from a 50X stock solution containing 1% (w/v) bovine serum 

25 albumin, 1% (w/v) ficoll, and 1% (w/v) polyvinylpyrrolidone in distilled water), about 10 
mg/mL to about 100 mg/mL fish sperm DNA, and about 0.02% (w/v) to about 0.1% (w/v) 
SDS, with an incubation at about 20°C to about 70°C for several hours to overnight. The 
stringency conditions are preferably provided by 6X SSC, 5X Denhardt's solution, 100 
mg/mL fish sperm DNA, and 0.1% (w/v) SDS, with an incubation at 55°C for several hours. 

30 The hybridization is generally followed by several wash steps. The wash 

compositions generally comprise 0.1X to about 10X SSC, and 0.01% (w/v) to about 0.5% 
(w/v) SDS with a 15 minute incubation at about 20°C to about 70°C. Preferably, the nucleic 
acid segments remain hybridized after washing at least one time in 0. IX SSC at 65°C. For 
example, the salt concentration in the wash step can be selected from a low stringency of 

35 about 2.0 X SSC at 50°C to a high stringency of about 0.2 X SSC at 65°C. In addition, the 
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temperature in the wash step can be increased from low stringency conditions at room 
temperature, about 22°C, to high stringency conditions at about 65°C. Both temperature and 
salt may be varied, or either the temperature or the salt concentration may be held constant 
while the other variable is changed. 
5 Low stringency conditions may be used to select nucleic acid sequences with lower 

sequence identities to a target nucleic acid sequence. One may wish to employ conditions 
such as about 6.0 X SSC to about 10 X SSC, at temperatures ranging from about 20°C to 
about 55°C, and preferably a nucleic acid molecule will hybridize to one or more of the 
above-described nucleic acid molecules under low stringency conditions of about 6.0 X SSC 

10 and about 45°C. In a preferred embodiment, a nucleic acid molecule will hybridize to one or 
more of the above-described nucleic acid molecules under moderately stringent conditions, 
for example at about 2.0 X SSC and about 65°C. In a particularly preferred embodiment, a 
nucleic acid molecule of the present invention will hybridize to one or more of the above- 
described nucleic acid molecules under high stringency conditions such as 0.2 X SSC and 

15 about 65°C. 

In an alternative embodiment, the nucleic acid molecule comprises a nucleic acid 
sequence that is greater than 85% identical, and more preferably greater than 86, 87, 88, 89, 
90, 91, 92, 93, 94, 95, 96, 97, 98, or 99% identical to a nucleic acid sequence selected from 
the group consisting of SEQ ID NO: 1 through 3 and 5 through 47, complements thereof, and 

20 fragments of any of these sequences. 

The percent identity is preferably determined using the "Best Fit" or "Gap" program 
of the Sequence Analysis Software Package™ (Version 10; Genetics Computer Group, Inc., 
University of Wisconsin Biotechnology Center, Madison, WI). "Gap" utilizes the algorithm 
of Needleman and Wunsch to find the alignment of two sequences that maximizes the 

25 number of matches and minimizes the number of gaps. "BestFit" performs an optimal 
alignment of the best segment of similarity between two sequences and inserts gaps to 
maximize the number of matches using the local homology algorithm of Smith and 
Waterman. The percent identity calculations may also be performed using the Megalign 
program of the LASERGENE bioinformatics computing suite (default parameters, 

30 DNASTAR Inc., Madison, Wisconsin). The percent identity is most preferably determined 
using the "Best Fit" program using default parameters. 

The present invention also provides nucleic acid molecule fragments that hybridize 
to the above-described nucleic acid molecules and complements thereof, fragments of 
nucleic acid molecules that exhibit greater than 80%, 85%, 90%, 95% or 99% sequence 
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identity with the above-described nucleic acid molecules and complements thereof, or 
fragments of any of these molecules. 

Fragment nucleic acid molecules may consist of significant portion(s) of, or indeed 
most of, the nucleic acid molecules of the invention. In an embodiment, the fragments are 
5 between about 3000 and about 1000 consecutive nucleotides, about 1800 and about 150 

consecutive nucleotides, about 1500 and about 500 consecutive nucleotides, about 1300 and 
about 250 consecutive nucleotides, about 1000 and about 200 consecutive nucleotides, about 
800 and about 150 consecutive nucleotides, about 500 and about 100 consecutive 
nucleotides, about 300 and about 75 consecutive nucleotides, about 100 and about 50 
10 consecutive nucleotides, about 50 and about 25 consecutive nucleotides, or about 20 and 
about 10 consecutive nucleotides long of a nucleic molecule of the present invention. 

In another embodiment, the fragment comprises at least 20, 30, 40, 50, 60, 70, 80, 
90, 100, 150, 200, 250, 500, or 750 consecutive nucleotides of a nucleic acid sequence of the 
present invention. 

15 Exemplary Uses 

Nucleic acid molecules of the invention and fragments thereof may be employed to 
obtain other nucleic acid molecules from the same species {e.g., nucleic acid molecules from 
maize may be utilized to obtain other nucleic acid molecules from maize). Exemplary 
nucleic acid molecules that may be obtained include, but are not limited to, nucleic acid 

20 molecules that encode the complete coding sequence of a protein and promoters and flanking 
sequences of such molecules, and nucleic acid molecules that encode for other isozymes or 
gene family members. 

Nucleic acid molecules of the invention and fragments thereof may also be employed 
to obtain nucleic acid homologs. Such homologs include the nucleic acid molecules of other 

25 plants or other organisms, including the nucleic acid molecules that encode, in whole or in 
part, protein homologs of other plant species or other organisms, or sequences of genetic 
elements, such as promoters and transcriptional regulatory elements. 

Promoters that may be isolated include, but are not limited to promoters of cell 
enhanced, cell specific, tissue enhanced, tissue specific, developmentally or environmentally 

30 regulated expression profiles. Promoters obtained utilizing the nucleic acid molecules of the 
invention could also be modified to affect their control characteristics. Examples of such 
modifications would include but are not limited to enhancer sequences. Such genetic 
elements could be used to enhance gene expression of new and existing traits for crop 
improvement. 
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The above-described molecules can be readily obtained by using the above-described 
nucleic acid molecules or fragments thereof to screen cDNA or genomic libraries obtained 
from such plant species. These methods are known to those of skill in the art, as are methods 
for forming such libraries. In one embodiment, such sequences are obtained by incubating 
5 nucleic acid molecules of the present invention with members of genomic libraries and 
recovering clones that hybridize to such nucleic acid molecules thereof. In a second 
embodiment, methods of chromosome walking or inverse PCR may be used to obtain such 
sequences. 

Any of a variety of methods may be used to obtain one or more of the above- 

1 0 described nucleic acid molecules. Automated nucleic acid synthesizers may be employed for 
this purpose. In lieu of such synthesis, the disclosed nucleic acid molecules may be used to 
define a pair of primers that can be used with the polymerase chain reaction to amplify and 
obtain any desired nucleic acid molecule or fragment. 

In a preferred embodiment, nucleic acid molecules having SEQ ID NOs: 1 through 3 

15 and 5 through 47, and complements thereof, and fragments of any of these sequences can be 
utilized to obtain such homologs. Such homolog molecules may differ in their nucleotide 
sequences from those found in one or more of SEQ ID NOs: 1 through 3, and 5 through 47 or 
complements thereof because complete complementarity is not needed for stable 
hybridization. The nucleic acid molecules of the invention therefore also include molecules 

20 that, although capable of specifically hybridizing with the nucleic acid molecules may lack 
"complete complementarity." 

In a preferred embodiment, the molecules are obtained from alfalfa, apple, 
Arabidopsis, banana, barley, Brassica, Brassica campestris, Brassica napus, broccoli, 
cabbage, canola, castor bean, chrysanthemum, citrus, coconut, coffee, cotton, crambe, 

25 cranberry, cucumber, Cuphea, dendrobium, dioscorea, eucalyptus, fescue, fir, garlic, 
gladiolus, grape, hordeum, lentils, lettuce, liliacea, linseed, maize, millet, muskmelon, 
mustard, oat, oil palm, oilseed rape, onion, an ornamental plant, papaya, pea, peanut, pepper, 
perennial ryegrass, Phaseolus, pine, poplar, potato, rapeseed (including Canola and High 
Erucic Acid varieties), rice, rye, safflower, sesame, sorghum, soybean, strawberry, sugarbeet, 

30 sugarcane, sunflower, tea, tomato, triticale, turf grasses, and wheat. 

In a more preferred embodiment, the molecules are obtained from Brassica 
campestris, Brassica napus, canola, castor bean, coconut, cotton, crambe, linseed, maize, 
mustard, oil palm, peanut, rapeseed (including Canola and High Erucic Acid varieties), rice, 
safflower, sesame, soybean, sunflower, and wheat, and in a particularly preferred 
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embodiment from coconut, crambe, maize, oil palm, peanut, rapeseed (including Canola and 
High Erucic Acid varieties), safflower, sesame, soybean, and sunflower. 

The Sequence Analysis Software Package™ (Version 10; Genetics Computer Group, 
Inc., University of Wisconsin Biotechnology Center, Madison, WI) contains a number of 
5 other useful sequence analysis tools for identifying homologs of the presently disclosed 
nucleotide and amino acid sequences. For example, programs such as "BLAST", "FastA", 
"TfastA", "FastX", and "TfastX" can be used to search for sequences similar to a query 
sequence. See, e.g., Altschul et ah, Journal of Molecular Biology 215: 403-410 (1990); 
Lipman and Pearson, Science 227: 1435-1441 (1985); Pearson and Lipman, 55:2444-2448 

10 (1988); Pearson, "Rapid and Sensitive Sequence Comparison with FASTP and FASTA" in 
Methods in Enzymology , (R. Doolittle, ed.), 183:63-98, Academic Press, San Diego, 
California, USA (1990). 

Short nucleic acid sequences having the ability to specifically hybridize to 
complementary nucleic acid sequences may be produced and utilized in the present 

15 invention, e.g., as probes to identify the presence of a complementary nucleic acid sequence 
in a given sample. Alternatively, the short nucleic acid sequences may be used as 
oligonucleotide primers to amplify or mutate a complementary nucleic acid sequence using 
PCR technology. These primers may also facilitate the amplification of related 
complementary nucleic acid sequences (e.g., related sequences from other species). 

20 Use of these probes or primers may greatly facilitate the identification of transgenic 

plants which contain the presently disclosed promoters and structural nucleic acid sequences. 
Such probes or primers may also be used to screen cDNA or genomic libraries for additional 
nucleic acid sequences related to or sharing homology with the presently disclosed promoters 
and structural nucleic acid sequences. The probes may also be PCR probes, which are 

25 nucleic acid molecules capable of initiating a polymerase activity while in a double-stranded 
structure with another nucleic acid. 

A primer or probe is generally complementary to a portion of a nucleic acid sequence 
that is to be identified, amplified, or mutated and of sufficient length to form a stable and 
sequence-specific duplex molecule with its complement. The primer or probe preferably is 

30 about 10 to about 200 nucleotides long, more preferably is about 10 to about 100 nucleotides 
long, even more preferably is about 10 to about 50 nucleotides long, and most preferably is 
about 14 to about 30 nucleotides long. 

The primer or probe may, for example without limitation, be prepared by direct 
chemical synthesis, by PCR (U.S. Patent Nos. 4,683,195 and 4,683,202), or by excising the 

35 nucleic acid specific fragment from a larger nucleic acid molecule. Various methods for 
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determining the structure of PCR probes and PCR techniques exist in the art. Computer- 
generated searches using programs such as Primer3 (www-genome.wi.mit. edu/cgi- 
bin/primer/primer3.cgi), STSPipeline (www-genome.wi.mit.edu/cgi-bin/www- 
STSJPipeline), or GeneUp (Pesole et aL, BioTechniques 25:1 12-123, 1998), for example, can 
5 be used to identify potential PCR primers. 

IL Protein and Peptide Molecules 

Agents of the invention include proteins, peptide molecules, and fragments thereof 
encoded by nucleic acid agents of the invention. Preferred classes of protein and peptide 
molecules include: (1) GCPE proteins and peptide molecules; (2) GCPE proteins and peptide 

1 0 molecules derived from an organism having a MEP pathway; (3) GCPE proteins and peptide 
molecules derived from plants; and (4) GCPE proteins and peptide molecules derived from 
oilseed plants, including, but not limited to Brassica campestris, Brassica napus, canola, 
castor bean, coconut, cotton, crambe, linseed, maize, mustard, oil palm, peanut, rapeseed, 
rice, safflower, sesame, soybean, sunflower, and wheat. 

15 Other preferred proteins are those proteins having an amino acid sequence: (1) 

selected from the group consisting of SEQ ID NOs: 4, 48, 49, and 50; (2) selected from the 
group consisting of SEQ ID NOs: 4, 48 and 49; (3) selected from the group consisting of 
SEQ ID NOs: 4 and 49; (4) of SEQ ID NO: 4; (5) of SEQ ID NO: 48; (6) of SEQ ID NO: 49; 
and (7) of SEQ ID NO: 50. 

20 In another preferred aspect of the present invention the protein or peptide molecule is 

encoded by a nucleic acid agent of the invention, including, but not limited to a nucleic acid 
sequence that is selected from: (1) any of SEQ ID NOs: 1 through 3, 5 through 47, 
complements thereof, or fragments of these sequences; (2) the group consisting of SEQ ID 
NOs: 1, 2, complements thereof, and fragments of these sequences; (3) the group consisting 

25 of SEQ ID NOs: 1, 2, 3, complements thereof and fragments of these sequences; (4) the 

group consisting of SEQ ID NOs: 1,2, 13 through 47, complements thereof and fragments of 
these sequences; (5) the group consisting of SEQ ID NOs: 5 through 12, complements 
thereof and fragments of these sequences; or (6) the group consisting of SEQ ID NOs: 1 
through 3, 5 through 47, complements thereof and fragments of these sequences. 

30 Any of the nucleic acid agents of the invention may be linked with additional nucleic 

acid sequences to encode fusion proteins. The additional nucleic acid sequence preferably 
encodes at least one amino acid, peptide, or protein. Many possible fusion combinations 
exist. For instance, the fusion protein may provide a "tagged" epitope to facilitate detection 
of the fusion protein, such as GST, GFP, FLAG, or polyHIS. Such fusions preferably encode 
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between 1 and about 50 amino acids, more preferably between about 5 and about 30 
additional amino acids, and even more preferably between about 5 and about 20 amino acids. 

' Alternatively, the fusion may provide regulatory, enzymatic, cell signaling, or 
intercellular transport functions. For example, a sequence encoding a plastid transit peptide 
5 may be added to direct a fusion protein to the chloroplasts within seeds. Such fusion partners 
preferably encode between 1 and about 1000 additional amino acids, more preferably 
between about 5 and about 500 additional amino acids, and even more preferably between 
about 10 and about 250 amino acids. 

The above-described protein or peptide molecules may be produced via chemical 

10 synthesis, or rnorp preferably, by expression in a suitable bacterial or eukaryotic host. 

Suitable methods for expression are described by Sambrook et aL, supra, or similar texts. 
Fusion protein or peptide molecules of the invention are preferably produced via 
recombinant means. These proteins and peptide molecules may be derivatized to contain 
carbohydrate or other moieties (such as keyhole limpet hemocyanin, etc.), 

1 5 Also contemplated are protein and peptide agents, including fragments and fusions 

thereof, in which conservative, non-essential or non-relevant amino acid residues have been 
added, replaced or deleted. A further particularly preferred class of protein is a GCPE 
protein, in which conservative, non-essential or non-relevant amino acid residues have been 
added, replaced or deleted. Computerized means for designing modifications in protein 

20 structure are known in the art. See, e.g., Dahiyat and Mayo, Science 278:82-87 (1997). 

A protein of the invention can also be a homolog protein. In a preferred 
embodiment, the nucleic acid molecules of the present invention, complements thereof, and 
fragments of these sequences can be utilized to obtain such homologs. In another preferred 
embodiment, the homolog is selected from the group consisting of alfalfa, apple, 

25 Arabidopsis, banana, barley, Brassica, Brassica campestris, Brassica napus, broccoli, 
cabbage, canola, castor bean, chrysanthemum, citrus, coconut, coffee, cotton, crambe, 
cranberry, cucumber, Cuphea, dendrobium, dioscorea, eucalyptus, fescue, fir, garlic, 
gladiolus, grape, hordeum, lentils, lettuce, liliacea, linseed, maize, millet, muskmelon, 
mustard, oat, oil palm, oilseed rape, onion, an ornamental plant, papaya, pea, peanut, pepper, 

30 perennial ryegrass, Phaseolus, pine, poplar, potato, rapeseed (including Canola and High 

Erucic Acid varieties), rice, rye, safflower, sesame, sorghum, soybean, strawberry, sugarbeet, 
sugarcane, sunflower, tea, tomato, triticale, turf grasses, and wheat. 

In a more preferred embodiment, the homolog is selected from Brassica campestris, 
Brassica napus, canola, castor bean, coconut, cotton, crambe, linseed, maize, mustard, oil 

35 palm, peanut, rapeseed (including Canola and High Erucic Acid varieties), rice, safflower, 



21 



WO 02/12478 



PCT7US0 1/24335 



sesame, soybean, sunflower, and wheat, and in a particularly preferred embodiment from 
coconut, crambe, maize, oil palm, peanut, rapeseed (including Canola and High Erucic Acid 
varieties), safflower, sesame, soybean, and sunflower. 

Agents of the invention include proteins comprising at least about a contiguous 10 
5 amino acid region preferably comprising at least about a contiguous 20 amino acid region, 
even more preferably comprising at least about a contiguous 25, 35, 50, 75 or 100 amino acid 
region of a protein of the present invention. In another preferred embodiment, the proteins of 
the present invention include between about 10 and about 25 contiguous amino acid region, 
more preferably between about 20 and about 50 contiguous amino acid region, and even 

10 more preferably between about 40 and about 80 contiguous amino acid region. 

Due to the degeneracy of the genetic code, different nucleotide codons may be used 
to code for a particular amino acid. A host cell often displays a preferred pattern of codon 
usage. Nucleic acid sequences are preferably constructed to utilize the codon usage pattern 
of the particular host cell. This generally enhances the expression of the nucleic acid 

1 5 sequence in a transformed host cell. Any of the above described nucleic acid and amino acid 
sequences may be modified to reflect the preferred codon usage of a host cell or organism in 
which they are contained. Modification of a nucleic acid sequence for optimal codon usage 
in plants is described in U.S. Patent No. 5,689,052. Additional variations in the nucleic acid 
sequences may encode proteins having equivalent or superior characteristics when compared 

20 to the proteins from which they are engineered. 

It is understood that certain amino acids may be substituted for other amino acids in 
a protein or peptide structure (and the nucleic acid sequence that codes for it) without 
appreciable change or loss of its biological utility or activity. For example, amino acid 
substitutions may be made without appreciable loss of interactive binding capacity in the 

25 antigen-binding regions of antibodies, or binding sites on substrate molecules. The 

modifications may result in either conservative or non-conservative changes in the amino 
acid sequence. The amino acid changes may be achieved by changing the codons of the 
nucleic acid sequence, according to the codons given in Table 1 . 

Table 1: Codon degeneracy of amino acids 



Amino acid 


One letter 


Three letter Codons 


Alanine 


A 


Ala 


GCA GCC GCG GCT 


Cysteine 


C 


Cys 


TGC TGT 


Aspartic acid 


D 


Asp 


GAC GAT 


Glutamic acid 


E 


Glu 


GAA GAG 


Phenylalanine 


F 


Phe 


TTC TTT 


Glycine 


G 


Gly 


GGA GGC GGG GGT 
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Amino acid 


One letter 


Three letter 


Codons 


Histidine 


H 


His 


CAC CAT 


Isoleucine 


I 


He 


ATA ATC ATT 


Lysine 


K 


Lys 


AAA AAG 


Leucine 


L 


Leu 


TTA TTG CTA CTC CTG CTT 


Methionine 


M 


Met 


ATG 


Asparagine 


N 


Asn 


AAC AAT 


Proline 


P 


Pro 


CCA CCC CCG CCT 


Glutamine 


Q 


Gin 


CAA CAG 


Arginine 


R 


Arg 


AGA AGG CGA CGC CGG 
CGT 


Serine 


S 


Ser 


AGC AGT TCA TCC TCG TCT 


Threonine 


T 


Thr 


ACA ACC ACG ACT 


Valine 


V 


Val 


GTA GTC GTG GTT 


Tryptophan 


w 


Tip 


TGG 


Tyrosine 


Y 


Tyr 


TAC TAT 



It is well known in the art that one or more amino acids in a native sequence can be 



substituted with other amino acid(s) 5 the charge and polarity of which are similar to that of 
the native amino acid, Le., a conservative amino acid substitution, resulting in a silent 
change. Conservative substitutes for an amino acid within the native polypeptide sequence 
5 can be selected from other members of the class to which the amino acid belongs. Amino 
acids can be divided into the following four groups: (1) acidic (negatively charged) amino 
acids, such as aspartic acid and glutamic acid; (2) basic (positively charged) amino acids, 
such as arginine, histidine, and lysine; (3) neutral polar amino acids, such as glycine, serine, 
threonine, cysteine, cystine, tyrosine, asparagine, and glutamine; and (4) neutral nonpolar 

10 (hydrophobic) amino acids such as alanine, leucine, isoleucine, valine, proline, 
phenylalanine, tryptophan, and methionine. 

In a further aspect of the present invention, nucleic acid molecules of the present 
invention can comprise sequences that differ from those encoding a protein or fragment 
thereof selected from the group consisting of SEQ ID NOs: 4 and 48 through 50 due to the 

15 fact that the different nucleic acid sequence encodes a protein having one or more 
conservative amino acid changes. 

In a preferred aspect, biologically functional equivalents of the proteins or fragments 
thereof of the present invention can have about 10 or fewer conservative amino acid changes, 
more preferably about 7 or fewer conservative amino acid changes, and most preferably 

20 about 5 or fewer conservative amino acid changes. In a preferred embodiment, the protein 
has between about 5 and about 500 conservative changes, more preferably between about 10 
and about 300 conservative changes, even more preferably between about 25 and about 150 
conservative changes, and most preferably between about 5 and about 25 conservative 
changes or between 1 and about 5 conservative changes. 
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Non-conservative changes include additions, deletions, and substitutions that result 
in an altered amino acid sequence. In a preferred embodiment, the protein has between about 
5 and about 500 non-conservative amino acid changes, more preferably between about 10 
and about 300 non-conservative amino acid changes, even more preferably between about 25 
5 and about 150 non-conservative amino acid changes, and most preferably between about 5 
and about 25 non-conservative amino acid changes or between 1 and about 5 non- 
conservative changes. 

In making such changes, the role of the hydropathic index of amino acids in 
conferring interactive biological function on a protein may be considered. See Kyte and 
10 Doolittle, J. Mol Biol. 757:105-132 (1982). It is accepted that the relative hydropathic 

character of amino acids contributes to the secondary structure of the resultant protein, which 
in turn defines the interaction of the protein with other molecules, e.g, 9 en:zymes, substrates, 
receptors, DNA, antibodies, antigens, eta It is also understood in the art that the substitution 
of like amino acids may be made effectively on the basis of hydrophilicity, as the greatest 
1 5 local average hydrophilicity of a protein is known to correlate with a biological property of 
the protein. U.S. Patent No. 4,5 54, 1 0 1 . 

Each amino acid has been assigned a hydropathic index and a hydrophilic value, as 
shown in Table 2. 



Table 2: Amino Acid Hydropathic Indices and Hydrophilic Values 



Amino acid 


Hydropathic Index 


Hydrophilic Value 


Alanine 


+1.8 


-0.5 


Cysteine 


+2.5 


-1.0 


Aspartic acid 


-3.5 


+3.0 ±1 


Glutamic acid 


-3.5 


+3.0+1 


Phenylalanine 


+2.8 


-2.5 


Glycine 


-0.4 


0 


Histidine 


-3.2 


-0.5 


Isoleucine 


+4.5 


-1.8 


Lysine 


-3.9 


+3.0 


Leucine 


+3.8 


-1.8 


Methionine 


+1.9 


-1.3 


Asparagine 


-3.5 


+0.2 


Proline 


-1.6 


-0.5 ±1 


Glutamine 


-3.5 


+0.2 


Arginine 


-4.5 


+3.0 


Serine 


-0.8 


+0.3 


Threonine 


-0.7 


-0.4 


Valine 


+4.2 ■ 


-1.5 


Tryptophan 


-0.9 


-3.4 


Tyrosine 


-1.3 


-2.3 
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It is known in the art that certain amino acids may be substituted by other amino 
acids having a similar hydropathic or hydrophilic index, score or value, and still result in a 
protein with similar biological activity, i.e., still obtain a biologically functional protein. In 
making such changes, the substitution of amino acids whose hydropathic indices or 
5 hydrophilic values are within ±2 is preferred, those within ±1 are more preferred, and those 
within ±0.5 are most preferred. 

As outlined above, amino acid substitutions are therefore based on the relative 
similarity of the amino acid side-chain substituents, for example, their hydrophobicity, 
hydrophilicity, charge, size, and the like. Exemplary substitutions which take various of the 

1 0 foregoing characteristics into consideration are well known to those of skill in the art and 
include: arginine and lysine; glutamate and aspartate; serine and threonine; glutamine and 
asparagine; and valine, leucine, and isoleucine. 

These amino acid changes may be effected by mutating the nucleic acid sequence 
coding for the protein or peptide. Mutations to a nucleic acid sequence may be introduced in 

1 5 either a specific or random manner, both of which are well known to those of skill in the art 
of molecular biology. Mutations may include deletions, insertions, truncations, substitutions, 
fusions, shuffling of motif sequences, and the like. A myriad of site-directed mutagenesis 
techniques exist, typically using oligonucleotides to introduce mutations at specific locations 
in a structural nucleic acid sequence. Examples include single strand rescue, unique site 

20 elimination, nick protection, and PCR. Random or non-specific mutations may be generated 
by chemical agents (for a general review, see Singer and Kusmierek, Ann. Rev. Biochem. 
52:655-693, 1982) such as nitrosoguanidine and 2-aminopurine; or by biological methods 
such as passage through mutator strains (Greener et al. } Mol BiotechnoL 7:189-195, 1997). 

( CL Recombinant Vectors and Constructs 

25 Exogenous and/or heterologous genetic material may be transferred into a host cell 

by use of a vector or construct designed for such a purpose. Any of the nucleic acid 
sequences described above may be provided in a recombinant vector. The vector may be a 
linear or a closed circular plasmid. The vector system may be a single vector or plasmid or 
two or more vectors or plasmids that together contain the total DNA to be introduced into the 

30 genome of the host. Means for preparing recombinant vectors are well known in the art. 
Methods for making recombinant vectors particularly suited to plant transformation are 
described in U.S. Patent Nos.: 4,971,908,4,940,835,4,769,061 and 4,757,011. 

Typical vectors useful for expression of nucleic acids in higher plants are well known 
in the art and include vectors derived from the tumor-inducing (Ti) plasmid of 
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Agrobacteriutn tumefaciens . Other vector systems suitable for introducing transforming 
DNA into a host plant cell include, but are not limited to the pCaMVCN transfer control 
vector, binary artificial chromosome (BIB AC) vectors (Hamilton et aL, Gene 200: 107-1 16, 
1997), and transfection with RNA viral vectors (Della-Cioppa et al. , Ann. N.Y. Acad. Set 
5 792: 57-61, 1996). Additional vector systems also include plant selectable YAC vectors such 
as those described in Mullen et aL, Molecular Breeding 4:449-457 (1988). 

A construct or vector may include a promoter, e.g., a recombinant vector typically 
comprises, in a 5' to 3' orientation: a promoter to direct the transcription of a nucleic acid 
sequence of interest and a nucleic acid sequence of interest. Suitable promoters include, but 
1 0 are not limited to, those described herein. The recombinant vector may further comprise a 3 J 
transcriptional terminator, a 3' polyadenylation signal, other untranslated nucleic acid 
sequences, transit and targeting nucleic acid sequences, selectable markers, enhancers, and 
operators, as desired. 

The vector may be an autonomously replicating vector, i.e., a vector that exists as an 

1 5 extrachromosomal entity, the replication of which is independent of chromosomal 

replication, e.g., a plasmid, an extrachromosomal element, a minichromosome, or an 
artificial chromosome. The vector may contain any means for assuring self-replication. For 
autonomous replication, the vector may further comprise an origin of replication enabling the 
vector to replicate autonomously in the host cell in question. Alternatively, the vector may 

20 be one that, when introduced into the cell, is integrated into the genome and replicated 

together with the chromosome(s) into which it has been integrated. This integration may be 
the result of homologous or non-homologous recombination. 

Integration of a vector or nucleic acid into the genome by homologous 
recombination, regardless of the host being considered, relies on the nucleic acid sequence of 

25 the vector. Typically, the vector contains nucleic acid sequences for directing integration by 
homologous recombination into the genome of the host. These nucleic acid sequences enable 
the vector to be integrated into the host cell genome at a precise location or locations in one 
or more chromosomes. To increase the likelihood of integration at a precise location, there 
should be preferably two nucleic acid sequences that individually contain a sufficient number 

30 of nucleic acids, preferably about 400 bp to about 1500 bp, more preferably about 800 bp to 
about 1000 bp, which are highly homologous with the corresponding host cell target 
sequence. These nucleic acid sequences may be any sequence that is homologous with a host 
cell target sequence and, furthermore, may or may not encode proteins. 

Vectors suitable for replication in mammalian cells may include viral replicons, or 

35 sequences that ensure integration of the appropriate sequences encoding HCV epitopes into 



26 



WO 02/12478 



PCT7US0 1/24335 



the host genome. For example, another vector used to express foreign DNA is vaccinia virus. 
Such heterologous DNA is generally inserted into a gene that is non-essential to the virus, for 
example, the thymidine kinase gene (tk), which also provides a selectable marker. 
Expression of the HCV polypeptide then occurs in cells or animals that are infected with the 
5 live recombinant vaccinia virus. 

In general, plasmid vectors containing replicon and control sequences that are 
derived from species compatible with the host cell are used in connection with bacterial 
hosts. The vector ordinarily carries a replication site, as well as marking sequences that are 
capable of providing phenotypic selection in transformed cells. For example, E. coli is 
10 typically transformed using pBR322, which contains genes for ampicillin and tetracycline 
resistance and thus provides easy means for identifying transformed cells. The pBR322 
plasmid, or other microbial plasmid or phage, also generally contains, or is modified to 
contain, promoters that can be used by the microbial organism for expression of the 
selectable marker genes. 

15 Promoters 

Promoters used in the context of the present invention are selected on the basis of the 
cell type into which the vector will be inserted. Promoters that function in bacteria, yeast, 
and plants are all taught in the art. The promoters may also be selected on the basis of their 
regulatory features, e.g., enhancement of transcriptional activity, inducibility, tissue 

20 specificity, and developmental stage-specificity. Additional promoters that may be utilized 
are described, for example, in U.S. Patent Nos. 5,378,619; 5,391,725; 5,428,147; 5,447,858; 
5,608,144; 5,614,399; 5,633,441; 5,633,435; and 4,633,436. 

Particularly preferred promoters in the recombinant vector include the nopaline 
synthase (nos) promoter; mannopine synthase (mas) promoter; octopine synthase (ocs) 

25 promoter; the cauliflower mosaic virus (CaMV) 19S and 35S promoters; the enhanced 

CaMV 35S promoter (eCaMV); the Figwort Mosaic Virus (FMV) 35S promoter; the light- 
inducible promoter from the small subunit of ribulose-l,5-bisphosphate carboxylase 
(ssRUBISCO); the EIF-4A promoter from tobacco; corn sucrose synthetase 1; corn alcohol 
dehydrogenase 1; corn light harvesting complex; corn heat shock protein; the chitinase 

30 promoter from Arabidopsis; the LTP (Lipid Transfer Protein) promoters from broccoli; 
petunia chalcone isomerase; bean glycine rich protein 1; potato patatin; the ubiquitin 
promoter from maize; the Adh promoter; the R gene complex promoter; and the actin 
promoter from rice. 
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The promoter is most preferably the nos, ocs, mas, CaMV19S, CaMV35S, eCaMV, 
ssRUBISCO, FMV, CaMV derived AS4, tobacco RB7, wheat POX1, tobacco EIF-4, lectin 
protein (Lei), or rice RC2 promoter. The promoter is preferably seed selective, tissue 
selective, constitutive, or inducible. 
5 Often-used constitutive promoters include the CaMV 35S promoter, the eCaMV 35S 

promoter, the FMV promoter, the mas promoter, the nos promoter, and the ocs promoter, 
which is carried on tumor-inducing plasmids of Agrobacterium tumefaciens. 

Useful inducible promoters include promoters induced by salicylic acid or 
polyacrylic acids (PR-1), induced by application of safeners (substituted benzenesulfonamide 
10 herbicides), heat-shock promoters, a nitrate-inducible promoter derived from the spinach 
nitrite reductase structural nucleic acid sequence, hormone-inducible promoters, and light- 
inducible promoters associated with the small subunit of RuBP carboxylase and LHCP 
families. 

For the purposes of expression in specific tissues of the plant, such as the leaf, seed, 
1 5 root or stem, it is preferred that the promoters utilized have relatively high expression in 
these specific tissues or organs. Examples reported in the literature include the chloroplast 
glutamine synthetase GS2 promoter from pea, the chloroplast fructose- 1,6-biphosphatase 
(FBPase) promoter from wheat, the nuclear photosynthetic ST-LS1 promoter from potato, 
the serine/threonine kinase (PAL) promoter and the glucoamylase (CHS) promoter from A. 
20 thaliana. 

Also reported to be active in photosynthetically active tissues are the ribulose-1,5- 
bisphosphate carboxylase (RbcS) promoter from eastern larch (Larix laricind), the promoters 
for the cab genes of pine, wheat, spinach, and rice, the pyruvate orthophosphate dikinase 
(PPDK) promoter fr om maize, the promoter for the tobacco Lhcbl*2 gene, the A. thaliana 

25 SUC2 sucrose~H+ symporter promoter and the promoter for the thylakoid membrane proteins 
from spinach (psaD, psaF, psaE, PC, FNR, atpC, atpD, cab, rbcS). Other promoters for the 
chlorophyll a/b-binding proteins may also be utilized in the invention, such as the promoters 
for LhcB gene and PsbP gene from white mustard. 

For the purpose of expression in sink tissues of the plant, such as the tuber of the 

30 potato plant, the fruit of tomato, or the seed of maize, wheat, rice and barley, it is preferred 
that the promoters utilized in the invention have relatively high expression in these specific 
tissues. A number of promoters for genes with tuber-specific or tuber-enhanced expression 
are known, including the class I patatin promoter, the promoter for the potato tuber ADPGPP 
genes, both the large and small subunits, the sucrose synthase promoter, the promoter for the 

35 major tuber proteins including the 22 kd protein complexes and protease inhibitors, the 
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promoter for the granule-bound starch synthase gene (GBSS) and other class I and II patatins 
promoters. 

Plant functional promoters useful for preferential expression in seeds include those 
from plant storage proteins and from proteins involved in fatty acid biosynthesis in oilseeds. 
5 Examples of such promoters include the 5 1 regulatory regions from such genes as napin, 

phaseolin, zein, soybean trypsin inhibitor, ACP, stearoyl-ACP desaturase, soybean a' subunit 
of b-conglycinin (soy 7 s), and oleosin. Further examples include the promoter for p- 
conglycinin and the lectin promoter from soybean. Seed-specific regulation is further 
discussed in EP 255 378. 

1 0 Also included are promoters for the zeins, which are a group of storage proteins 

found in maize endosperm. Genomic clones for zein genes have been isolated and the 
promoters from these clones, including the 15 kD, 16 kD, 19 kD, 22 kD, 27 kD and genes, 
can also be used. Other promoters known to function, for example, in maize include the 
promoters for the following genes: waxy, Brittle, Shrunken 2, Branching enzymes I and II, 

15 starch synthases, debranching enzymes, oleosins, glutelins and sucrose synthases. A 

particularly preferred promoter for maize endosperm expression is the promoter for the 

♦ 

glutelin gene from rice, more particularly the Osgt-1 promoter. 

Examples of promoters suitable for expression in wheat include those promoters for 

the ADP glucose pyrosynthase (ADPGPP) subunits, the granule bound and other starch 
20 synthase, the branching and debranching enzymes, the embryogenesis-abundant proteins, the 

gliadins and the glutenins. Preferred promoters in rice include promoters for the ADPGPP 

subunits, the granule bound and other starch synthase, the branching enzymes, the 

debranching enzymes, sucrose synthases and the glutelins, and particularly preferred is the 

promoter for rice glutelin, Osgt- 1 . Preferred promoters for barley include those promoters 
25 for the ADPGPP subunits, the granule bound and other starch synthase, the branching 

enzymes, the debranching enzymes, sucrose synthases, the hordeins, the embryo globulins 

and the aleurone specific proteins. 

Root specific promoters can also be used. An example of such a promoter is the 

promoter for the acid chitinase gene. Expression in root tissue can also be accomplished by 
30 utilizing the root specific subdomains of the CaMV35S promoter that have been identified. 

Other root cell specific promoters include those reported by Conkling et al . Plant Physiol. 

93:1203-1211 (1990). 

Examples of suitable promoters for use with filamentous fungi are obtained from the 

genes encoding Aspergillus oryzae TAKA amylase, Rhizomucor miehei aspartic proteinase, 
35 Aspergillus niger neutral alpha-amylase, A. niger acid stable alpha-amylase, A. niger or A. 
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awamori glucoamylase (glaA), Rhizomucor miehei lipase, Aspergillus oryzae alkaline 
protease, A. oryzae triose phosphate isomerase, Aspergillus nidulans acetamidase and hybrids 
thereof. In a yeast host, preferred promoters include the Saccharomyces cerevisiae enolase 
(eno-1), the TAKA amylase, NA2-tpi (a hybrid of the promoters from the genes encoding A 
5 niger neutral alpha-amylase and A. oryzae triose phosphate isomerase), glaA, S. cerevisiae 
GAL1 (galactokinase) and S. cerevisiae GPD (glyceraldehyde-3 -phosphate dehydrogenase) 
promoters. 

Suitable promoters for mammalian cells are also known in the art and include viral 
promoters, such as those from Simian Virus 40 (SV40), Rous sarcoma virus (RSV), 

10 adenovirus (ADV), cytomegalovirus (CMV), and bovine papilloma virus (BPV), as well as 
mammalian cell-derived promoters. Other preferred promoters include the hematopoietic 
stem cell-specific, e.g., CD34, glucose-6-phosphotase, interleukin-1 alpha, CDllc integrin 
gene, GM-CSF, interleukin-5R alpha, interleukin-2, c-fos, h-ras, and DMD gene promoters. 

Inducible promoters suitable for use with bacteria hosts include the ^-lactamase and 

1 5 lactose promoter systems, the arabinose promoter system, alkaline phosphatase, a tryptophan 
(trp) promoter system and hybrid promoters such as the tac promoter. However, other known 
bacterial inducible promoters are suitable. Promoters for use in bacterial systems also 
generally contain a Shine-Dalgarno sequence operably linked to the DNA encoding the 
polypeptide of interest. 

20 Examples of suitable promoters for an algal host are light harvesting protein 

promoters obtained from photosynthetic organisms, Chlorella virus methyltransferase 
promoters, CaMV 35 S promoter, PL promoter from bacteriophage X, nopaline synthase 
promoter from the Ti plasmid of A. tumefaciens, and bacterial trp promoter. 

Vectors for use with insect cells or insects may utilize a baculovirus transcriptional 

25 promoter including, e.g., but not limited to the viral DNAs of Autographa californica MNPV, 
Bombyx moriNPV, Trichoplusia ni MNPV, Rachiplusia ou MNPV or Galleria mellonella 
MNPV, wherein the baculovirus transcriptional promoter is a baculovirus immediate-early 
gene IE1 or IEN promoter; an immediate-early gene in combination with a baculovirus 
delayed-early gene promoter region selected from the group consisting of 39K and a Hindlll- 

30 k fragment delayed-early gene; or a baculovirus late gene promoter. 

Additional Nucleic Acid Sequences of Interest 

The recombinant vector may also contain one or more additional nucleic acid 
sequences of interest. These additional nucleic acid sequences may generally be any 
sequences suitable for use in a recombinant vector. Such nucleic acid sequences include, 
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without limitation, any of the nucleic acid sequences, and modified forms thereof, described 
above. The additional nucleic acid sequences may also be operably linked to any of the 
above described promoters. The one or more additional nucleic acid sequences may each be 
operably linked to separate promoters. Alternatively, the additional nucleic acid sequences 
5 may be operably linked to a single promoter (Le. a single operon). 

The additional nucleic acid sequences include, without limitation, those encoding 
seed storage proteins, fatty acid pathway enzymes, tocopherol biosynthetic enzymes, amino 
acid biosynthetic enzymes, and starch branching enzymes. Preferred seed storage proteins 
include zeins, 7S proteins, brazil nut protein, phenylalanine-free proteins, albumin, P- 

1 0 conglycinin, 1 1 S proteins, alpha-hordothionin, arcelin seed storage proteins, lectins, and 
glutenin. Preferred fatty acid pathway enzymes include thioesterases and desaturases. 

Preferred tocopherol biosynthetic enzymes include tyrA, sir 17 36, ATPT2, dxs, dxr, 
GGPPS, HPPD, GMT, MT1, AANT1, sir 1737, and an antisense construct for homogentisic 
acid dioxygenase. Preferred additional nucleic acid sequences encode MEP pathway proteins 

1 5 including ygbB, ygbP, ychB, yfgA, yfgB, dxs and dxr. More preferred nucleic acid sequences 
include yfgA and yfgB, and still other preferred nucleic acid sequences include ygbB, ychB 
andygbP. Preferred amino acid biosynthetic enzymes include anthranilate synthase, 
tryptophan decarboxylase, threonine decarboxylase, threonine deaminase, and aspartate kinase. 
Preferred starch branching enzymes include those set forth in U.S. Patent Nos. 6,232,122 and 

20 6,147,279, and WO 97/22703. 

Alternatively, the additional nucleic acid sequence may be designed to down-regulate a 
specific nucleic acid sequence. This is typically accomplished by operably linking the 
additional nucleic acid sequence, in an antisense orientation, with a promoter. One of ordinary 
skill in the art is familiar with such antisense technology. Any nucleic acid sequence may be 

25 negatively regulated in this manner. Preferable target nucleic acid sequences contain a low 
content of essential amino acids, yet are expressed at relatively high levels in particular 
tissues. For example, p-conglycinin and glycinin are expressed abundantly in seeds, but are 
nutritionally deficient with respect to essential amino acids. This antisense approach may also 
be used to effectively remove other undesirable proteins, such as antifeedants (e.g., lectins), 

30 albumin, and allergens, from plant-derived foodstuffs. 

Selectable and Screenable Markers 

A vector or construct may also include a selectable marker. Selectable markers can 
also be used to select for plants or plant cells that contain the exogenous genetic material. 
Examples of such include, but are not limited to: a neo gene, which codes for kanamycin 
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resistance and can be selected for using kanamycin, RptH, G418, hpt etc.; a bar gene, which 
codes for bialaphos resistance; a mutant EPSP synthase gene, aadA, which encodes 
glyphosate resistance; a nitrilase gene, which confers resistance to bromoxynil; a mutant 
acetolactate synthase gene (ALS), which confers imidazolinone or sulphonylurea resistance, 
5 ALS, and a methotrexate resistant DHFR gene. The selectable marker is preferably GUS, 
green fluorescent protein (GFP), neomycin phosphotransferase II (nptll), luciferase (LUX), 
an antibiotic resistance coding sequence, or an herbicide (e.g., glyphosate) resistance coding 
sequence. The selectable marker is most preferably a kanamycin, hygromycin, or herbicide 
resistance marker. 

10 A vector or construct can also include a screenable marker. Screenable markers are 

useful to monitor expression. Exemplary screenable markers include: a p-glucuronidase or 
uidA gene (GUS), which encodes an enzyme for which various chromogenic substrates are 
known; an R-locus gene, which encodes a product that regulates the production of 
anthocyanin pigments (red color) in plant tissues; a P-lactamase gene, which encodes an 

15 enzyme for which various chromogenic substrates are known {e.g., PAD AC, a chromogenic 
cephalosporin); a luciferase gene; a xylE gene, which encodes a catechol dioxygenase that 
can convert chromogenic catechols; an oc-amylase gene; a tyrosinase gene, which encodes an 
enzyme capable of oxidizing tyrosine to DOPA and dopaquinone which in turn condenses to 
melanin; an a-galactosidase, which will turn a chromogenic oc-galactose substrate. 

20 Included within the terms "selectable or screenable marker genes" are also genes that 

. encode a secretable marker whose secretion can be detected as a means of identifying or 
selecting for transformed cells. Examples include markers that encode a secretable antigen 
that can be identified by antibody interaction, or even secretable enzymes that can be 
detected catalytically. Secretable proteins fall into a number of classes, including small, 

25 diffusible proteins that are detectable, (e.g., by ELISA), small active enzymes that are 
detectable in extracellular solution (e.g., a-amylase, p-lactamase, phosphinothricin 
transferase), or proteins that are inserted or trapped in the cell wall (such as proteins which 
include a leader sequence such as that found in the expression unit of extension or tobacco 
PR-S). Other possible selectable and/or screenable marker genes will be apparent to those of 

30 skill in the art. 

Other Elements in the Recombinant Vector 

Various cis-acting untranslated 5' and 3' regulatory sequences may be included in the 
recombinant nucleic acid vector to produce desirable regulatory features. A vector or 
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construct may also include regulatory elements. Examples of such include the Adh intron 1, 
the sucrose synthase intron and the TMV omega element. These and other regulatory 
elements may be included when appropriate, and may be provided by the DNA sequence 
encoding the gene of interest or a convenient transcription termination region derived from a 
5 different gene source. 

A 3' non-translated region typically provides a transcriptional termination signal, and 
a polyadenylation signal that functions in plants to cause the addition of adenylate 
nucleotides to the 3' end of the mRNA. Such 3' non-translated regions can be obtained from 
the 3' regions of the nopaline synthase (nos) coding sequence, a soybean 7Sa' storage protein 

10 coding sequence, the arcelin-5 coding sequence, the albumin coding sequence, and the pea 
ssRUBlSCO E9 coding sequence. Particularly preferred 3' nucleic acid sequences include 
Arcelin-5 3', nos 3', E9 3\ adrU 3', 7Scc' 3', 1 IS 3 5 , USP 3\ and albumin 3'. 

Translational enhancers may also be incorporated as part of the recombinant vector, 
such as one or more 5 ' non-translated leader sequences that serve to enhance expression of 

1 5 the nucleic acid sequence. Such enhancer sequences may be desirable to increase or alter the 
translational efficiency of the resultant mRNA. Preferred 5 ' nucleic acid sequences include 
dSSU 5', PetHSP70 5', and GmHSP17.9 5'. Such sequences can be derived from the 
promoter selected to express the gene or can be specifically modified to increase translation 
of the mRNA. Such regions can also be obtained from viral RNAs, from suitable eukaryotic 

20 genes, or from a synthetic gene sequence. For a review of optimizing expression of 
transgenes, see Koziel et al, Plant Mol. Biol. 32:393-405 (1996). 

The recombinant vector can further comprise a nucleic acid sequence encoding a 
transit peptide. This peptide may be useful for directing a protein to the extracellular space, a 
plastid, or to some other compartment inside or outside of the cell, (see, e.g., EP 0218571; 

25 U.S. Patent Nos.: 4,940,835, 5,610,041, 5,618,988, and 6,107,060). The nucleic acid 

sequence in the recombinant vector may comprise introns. The introns may be heterologous 
with respect to the structural nucleic acid sequence. Preferred introns include the rice actin 
intron and the corn HSP70 intron. 

A protein or fragment thereof encoding nucleic acid molecule of the invention may 

30 also be operably linked to a suitable leader sequence. A leader sequence is a nontranslated 
region of a mRNA that is important for translation by the host. The leader sequence is 
operably linked to the 5 ! terminus of the nucleic acid sequence encoding the protein or 
fragment thereof. A polyadenylation sequence may also be operably linked to the 3* terminus 
of the nucleic acid sequence of the invention. The polyadenylation sequence is a sequence 
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that when transcribed is recognized by the host to add polyadenosine residues to transcribed 
mRNA. 

A protein or fragment thereof encoding nucleic acid molecule of the invention may 
also be linked to a propeptide coding region. A propeptide is an amino acid sequence found 
5 at the amino terminus of a proprotein or proenzyme. Cleavage of the propeptide from the 
proprotein yields a mature biochemically active protein. The resulting polypeptide is known 
as a propolypeptide or proenzyme (or a zymogen in some cases). Propolypeptides are 
generally inactive and can be converted to mature active polypeptides by catalytic or 
autocatalytic cleavage of the propeptide from the propolypeptide or proenzyme. 

10 The recombinant vectors can further comprise one or more sequences that encode 

one or more factors that are advantageous in the expression of the protein or peptide, for 
example, an activator (e.g. 9 a trans-acting factor), a chaperone and a processing protease. An 
activator is a protein that activates transcription of a nucleic acid sequence encoding a 
polypeptide, a chaperone is a protein that assists another protein in folding properly, and a 

1 5 processing protease is a protease that cleaves a propeptide to generate a mature 

biochemically active polypeptide. The nucleic acids encoding one or more of these factors 
are preferably not operably linked to the nucleic acid encoding the protein or fragment 
thereof. 

EL Transgenic Organisms, and Methods for Producing Same 

20 One or more of the nucleic acid molecules or recombinant vectors of the invention 

may be used in plant transformation or transfection. For example, exogenous genetic 
material may be transferred into a plant cell and the plant cell regenerated into a whole, 
fertile or sterile plant. In a preferred embodiment, the exogenous genetic material includes a 
nucleic acid molecule of the present invention, preferably a nucleic acid molecule encoding a 

25 GCPE protein. In another preferred embodiment, the nucleic acid molecule has a sequence 
selected from the group consisting of SEQ ID NOs: 1 through 3, 5 through 47, complements 
thereof and fragments of these sequences. Other preferred exogenous genetic material are 
nucleic acid molecules that encode a protein or fragment thereof having an amino acid 
sequence selected from the group consisting of SEQ ID NOs: 4 , and 48 through 50 or 

30 fragments thereof. 

The invention is also directed to transgenic plants and transformed host cells that 
comprise, in a 5 ' to 3 ' orientation, a promoter operably linked to a heterologous nucleic acid 
sequence of interest. Additional nucleic acid sequences may be introduced into the plant or 
host cell, such as 3 ' transcriptional terminators, 3 ' polyadenylation signals, other untranslated 
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nucleic acid sequences, transit or targeting sequences, selectable markers, enhancers, and 
operators. Preferred nucleic acid sequences of the present invention, including recombinant 
vectors, structural nucleic acid sequences, promoters, and other regulatory elements, are 
described above in parts A through C of the Detailed Description. Another embodiment of 
5 the invention is directed to a method of producing such transgenic plants which generally 
comprises the steps of selecting a suitable plant, transforming the plant with a recombinant 
vector, and obtaining the transformed host cell. 

A transformed host cell may generally be any cell which is compatible with the 
present invention. A transformed host jjlant or cell can be or derived from a plant, or from a 

10 cell or organism such as a mammalian cell, mammal, fish cell, fish, bird cell, bird, algae cell, 
algae, fungal cell, fungus, or bacterial cell. Preferred host and transformants include: fungal 
cells such as Aspergillus, yeasts, mammals, particularly bovine and porcine, insects, bacteria, 
and algae. Methods to transform such cells or organisms are known in the art. See, e.g., EP 
238023; Becker and Guarente, in: Abelson and Simon (eds.), Guide to Yeast Genetics and 

1 5 Molecular Biology, Methods Enzymol. 1 94: 1 82-1 87, Academic Press, Inc., New York; 

Bennett and LaSure (eds.), More Gene Manipulations in Fungi, Academic Press, CA, 1991; 
Hinnen et al, PNAS 75:1920, 1978; Ito et al, J. Bacteriology 153:163, 1983; Malardier et 
al, Gene 75:147-156, 1989; Yelton et al, PNAS 57:1470-1474, 1984. 

Transfer of a nucleic acid that encodes a protein can result in expression or 

20 overexpression of that protein in a transformed cell, transgenic organism or transgenic plant. 
One or more of the proteins or fragments thereof encoded by nucleic acid molecules of the 
invention may be overexpressed in a transformed cell, transgenic organism or transgenic 
plant. Such expression or overexpression may be the result of transient or stable transfer of 
the exogenous genetic material. 

25 In a preferred embodiment, expression or overexpression of a GCPE protein in a host 

provides in that host, relative to an untransformed host with a similar genetic background, an 
increased level of: (1) tocotrienols; (2) tocopherols; (3) a- tocopherols; (4) y-tocopherols; (5) 

isopentenyl diphosphate (DPP); (6) DMAPP; (7) a GCPE protein in a plastid; (8) isoprenoids; 

(9) carotenoids; (10) an isoprenoid-related compound selected from the group consisting of 
30 IPP, DMAPP, and a GCPE protein; or (1 1) an isoprenoid compound selected from the group 

consisting of tocotrienols, tocopherols, terpenes, gibberellins, carotenoids, xanthophylls, oc- 

tocopherols, y-tocopherols, IPP, DMAPP, and a GCPE protein. 

The expressed protein may be detected using methods known in the art that are 

specific for the particular protein or fragment. These detection methods may include the use 
35 of specific antibodies, formation of an enzyme product, or disappearance of an enzyme 



35 



WO 02/12478 



PCT7US0 1/24335 



substrate. For example, if the protein has enzymatic activity, an enzyme assay may be used. 
Alternatively, if polyclonal or monoclonal antibodies specific to the protein are available, 
immunoassays may be employed using the antibodies to the protein. The techniques of 
enzyme assay and immunoassay are well known to those skilled in the art. 
5 The resulting protein may be recovered by methods known in the arts. For example, 

the protein may be recovered from the nutrient medium by procedures including, but not 
limited to, centrifugation, filtration, extraction, spray-drying, evaporation, or precipitation. 
The recovered protein may then be further purified by a variety of chromatographic 
procedures, e.g., ion exchange chromatography, gel filtration chromatography, affinity 

10 chromatography, or the like. Reverse-phase high performance liquid chromatography (RP- 
HPLC), optionally employing hydrophobic RP-HPLC media, e.g., silica gel, further purify 
the protein. Combinations of methods and means can also be employed to provide a 
substantially purified recombinant polypeptide or protein. 

In another preferred embodiment, overexpression of the GCPE protein in a 

15 transgenic plant may provide tolerance to a variety of stresses, e.g., oxidative stress tolerance 
such as to oxygen or ozone, UV tolerance, heat tolerance, drought tolerance, cold tolerance, 
or fungal/microbial pathogen tolerance. 

As used herein in a preferred aspect, a tolerance or resistance to stress is determined 
by the ability of a plant, when challenged by a stress such as cold, to produce a plant having a 

20 higher yield than one without such tolerance or resistance to stress. In a particularly 

preferred aspect of the present invention, the tolerance or resistance to stress is measured 
relative to a plant with a similar genetic background to the tolerant or resistance plant except 
that the plant expresses or overexpresses a GCPE protein. 

Host Cells and Organisms 

25 Preferred host plants and cells can be or be derived from alfalfa, apple, Arabidopsis, 

banana, barley, Brassica, Brassica campestris, Brassica napus, broccoli, cabbage, canola, 
castor bean, chrysanthemum, citrus, coconut, coffee, cotton, crambe, cranberry, cucumber, 
Cuphea, dendrobium, dioscorea, eucalyptus, fescue, fir, garlic, gladiolus, grape, hordeum, 
lentils, lettuce, liliacea, linseed, maize, millet, muskmelon, mustard, oat, oil palm, oilseed 

30 rape, onion, an ornamental plant, papaya, pea, peanut, pepper, perennial ryegrass, Phaseolus, 
pine, poplar, potato, rapeseed (including Canola and High Erucic Acid varieties), rice, rye, 
safflower, sesame, sorghum, soybean, strawberry, sugarbeet, sugarcane, sunflower, tea, 
tomato, triticale, turf grasses, and wheat. 
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In a more preferred embodiment, the host plants and cells are, or are derived from, 
Brassica campestris, Brassica napus, canola, castor bean, coconut, cotton, crambe, linseed, 
maize, mustard, oil palm, peanut, rapeseed (including Canola and High Erucic Acid 
varieties), rice, safflower, sesame, soybean, sunflower, and wheat, and in a particularly 
5 preferred embodiment from coconut, crambe, maize, oil palm, peanut, rapeseed (including 
Canola and High Erucic Acid varieties), safflower, sesame, soybean, and sunflower. 

In another preferred embodiment, the plant or cell is or derived from canola. In 
another preferred embodiment, the plant or cell is or derived from Brassica napus. In a 
particularly preferred embodiment, the plant or cell is or derived from soybean. The soybean 

1 0 cell or plant is preferably a cell or plant of an elite soybean line. 

Other preferred plants and plant host cells for use in the methods of the present 
invention include, but are not limited to Acacia, alfalfa, aneth, apple, apricot, artichoke, 
arugula, asparagus, avocado, banana, barley, beet, blackberry, blueberry, broccoli, brussel 
sprouts, cabbage, canola, cantaloupe, carrot, cassava, cauliflower, celery, cherry, chicory, 

15 cilantro, citrus, Clementines, coffee, corn, cotton, cucumber, Douglas fir, eggplant, endive, 
escarole, eucalyptus, fennel, figs, garlic, gourd, grape, grapefruit, honey dew, jicama, 
kiwifruit, lettuce, leeks, lemon, lime, Loblolly pine, mango, melon, nectarine, oat, oil palm, 
oilseed rape, okra, onion, orange, an ornamental plant, papaya, parsley, pea, peach, peanut, 
pear, pepper, persimmon, pine, pineapple, plantain, plum, pomegranate, poplar, potato, 

20 pumpkin, quince, radiata pine, radicchio, radish, raspberry, rice, rye, sorghum, Southern pine, 
soybean, spinach, squash, strawberry, sugarbeet, sugarcane, sunflower, sweet potato, 
sweetgum, tangerine, tea, tobacco, tomato, triticale, turf, turnip, a vine, watermelon, wheat, 
yams, and zucchini. 

Mammalian cell lines available as hosts for expression are known in the art and 
25 include many immortalized cell lines available from the American Type Culture Collection 
(ATCC, Manassas, VA), such as HeLa cells, Chinese hamster ovary (CHO) cells, baby 
hamster kidney (BHK) cells and a number of other cell lines. 

The fungal host cell may, for example, be a yeast cell, a fungi, or a filamentous 
fungal cell. In one embodiment, the fungal host cell is a yeast cell, and in a preferred 
30 embodiment, the yeast host cell is a cell of the species of Candida, Kluyveromyces, 

Saccharomyces, Schizosaccharomyces, Pichia and Yarrowia. In another embodiment, the 
fungal host cell is a filamentous fungal cell, and in a preferred embodiment, the filamentous 
fungal host cell is a cell of the species of Acremonium, Aspergillus, Fusarium, Humicola, 
Myceliophthora, Mucor, Neurospora, Penicillium, TJiielavia, Tolypocladium and 
35 Trichodenna. 
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Suitable host bacteria include archaebacteria and eubacteria, especially eubacteria 
and most preferably Enter obacteriaceae. Examples of useful bacteria include Escherichia, 
Enterobacter, Azotobacter, Erwinia, Bacillus, Pseudomonas, Klebsiella, Proteus, 
Salmonella, Serratia, Shigella, Rhizobia, Vitreoscilla and Paracoccus . Suitable is. coli hosts 
5 include E. coli W31 10 (ATCC 27325), E. coli 294 (ATCC 3 1446), E. coli B and E. coli 
X1776 (ATCC 31537) (American Type Culture Collection, Manassas, Virginia). Mutant 
cells of any of the above-mentioned bacteria may also be employed. These hosts may be 
used with bacterial expression vectors such as E. coli cloning and expression vector 
Bluescript™ (Stratagene, La Jolla, CA); pIN vectors (Van Heeke and Schuster 1989), and 

1 0 pGEX vectors (Promega, Madison Wis.), which may be used to express foreign polypeptides 
as fusion proteins with glutathione S-transferase (GST). 

Preferred insect host cells are derived from Lepidopteran insects such as Spodoptera 
frugiperda or Trichoplusia ni. The preferred Spodoptera frugiperda cell line is the cell line 
Sf9 (ATCC CRL 1711). Other insect cell systems, such as the silkworm B. mori can also be 

1 5 used. These host cells are preferably used in combination with Baculovirus expression 
vectors (BEVs), which are recombinant insect viruses in which the coding sequence for a 
chosen foreign gene has been inserted behind a baculovirus promoter in place of the viral 
gene, e.g., polyhedrin (U.S. Patent No. 4,745,051). 

* 

Methods for Introducing Nucleic Acid Molecules into Organisms 

20 Technology for introduction of nucleic acids into cells is well known to those of skill 

in the art. Common methods include chemical methods, microinjection, electroporation 
(U.S. Patent No. 5,384,253), particle acceleration, viral vectors, and receptor-mediated 
mechanisms. Fungal cells may be transformed by a process involving protoplast formation, 
transformation of the protoplasts and regeneration of the cell wall. The various techniques 

25 for transforming mammalian cells are also well known. 

Algal cells may be transformed by a variety of known techniques, including but not 
limit to, microprojectile bombardment, protoplast fusion, electroporation, microinjection, and 
vigorous agitation in the presence of glass beads. Suitable procedures for transformation of 
green algal host cells are described in EP 108580. A suitable method of transforming cells of 

30 diatom Phaeodactylum tricornutum species is described in WO 97/39106. Chlorophyll C- 
containing algae may be transformed using the procedures described in U.S. Patent No. 
5,661,017. 

Methods for introducing nucleic acids into plants are also well known. Suitable 
methods include bacterial infection {e.g., Agrobacterium), binary bacterial artificial 
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chromosome vectors, direct delivery of nucleic acids (e.g., via PEG-mediated 
transformation), desiccation/inhibition-mediated nucleic acid uptake, electroporation, 
agitation with silicon carbide fibers, and acceleration of nucleic acid coated particles, etc. 
(reviewed in Potrykus et al, Ann. Rev. Plant Physiol. Plant Mol. Biol. 42:205, 1991). For 
5 example, electroporation has been used to transform maize protoplasts. 

Alternatively, nucleic acids can be directly introduced into pollen by directly 
injecting a plant's reproductive organs. In another transformation technique, nucleic acids 
may also be injected into immature embryos. Plastids of higher plants can be stably 
transformed via particle gun delivery of DNA containing a selectable marker and targeting of 

10 the DNA to the plastid genome through homologous recombination (U.S. Patent Nos. 
5,451,513 and 5,545,818). 

Methods for transforming dicots, primarily by use of Agrobacterium turnefaciens and 
obtaining transgenic plants, have been published for cotton, soybean, Brassica, peanut, 
papaya, pea and Arabidopsis thaliana. E.g., U.S. Patent Nos. 5,004,863, 5,159,135, 

15 5,416,01 1 5,463,174, 5,518,908, and 5,569,834. The latter method for transforming 

Arabidopsis thaliana is commonly called "dipping" or vacuum infiltration or germplasm 
transformation. Transformation of monocotyledons using electroporation, particle 
bombardment and Agrobacterium has also been reported. Transformation and plant 
regeneration have been achieved in asparagus, barley, maize, oat, orchard grass, rice, rye, 

20 sugarcane, tall fescue, and wheat. 

Transformation of plant protoplasts can be achieved using methods based on calcium 
phosphate precipitation, polyethylene glycol treatment, electroporation and combinations of 
these treatments. Application of these systems to different plant strains depends upon the 
ability to regenerate that particular plant strain from protoplasts. Illustrative methods for the 

25 regeneration of cereals from protoplasts are described in Abdullah et al., Biotechnology 
4:1087 (1986); Fujimura et al., Plant Tissue Culture Letters 2:1 A (1985); Toriyama et al., 
TfieorAppl. Genet. 205:34 (1986); and Yamada et al., Plant Cell Rep. 4:85 (1986). 

To transform plant strains that cannot be successfully regenerated from protoplasts, 
other ways to introduce DNA into intact cells or tissues can be utilized. For example, cereals 

30 may be regenerated from immature embryos or explants. In addition, "particle gun" or high- 
velocity microprojectile technology can be utilized. Using the latter technology, DNA is 
carried through the cell wall and into the cytoplasm on the surface of small metal particles. 
The metal particles penetrate through several layers of cells and thus allow the 
transformation of cells within tissue explants. A particular advantage of microprojectile 

35 bombardment, in addition to it being an effective means of reproducibly transforming 
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monocots, is that neither the isolation of protoplasts (Christou et al., Plant Physiol. 57:671- 
674, 1988), nor the susceptibility to Agrobacterium infection is required. See also Yang and 
Christou (eds.), Particle Bombardment Technology for Gene Transfer, Oxford Press, Oxford, 
England (1994). 

5 An illustrative embodiment of a method for delivering DNA into maize cells by 

acceleration is a biolistics a-particle delivery system, which can be used to propel tungsten 
particles coated with DNA through a screen, such as a stainless steel or Nytex screen, onto a 
filter surface covered with corn cells cultured in suspension. Alternatively, immature 
embryos or other target cells may be arranged on solid culture medium. The screen disperses 

10 the tungsten nucleic acid particles so that they are not delivered to the recipient cells in large 
aggregates. A particle delivery system suitable for use with the invention is the helium 
acceleration PDS-1000/He gun, which is available from Bio-Rad Laboratories (Bio-Rad, 
Hercules, California). 

Through the use of techniques set forth herein, one may obtain about 1000 or more 

1 5 loci of cells transiently expressing a marker gene. The number of cells in a focus which 
express the exogenous gene product 48 hours post-bombardment often ranges from one to 
ten, and average one to three. 

In bombardment transformation, one may optimize the pre-bombardment culturing 
conditions and the bombardment parameters to yield the maximum numbers of stable 

20 transformants. Important physical parameters to adjust include physical parameters such as 
gap distance, flight distance, tissue distance and helium pressure. In addition, biological 
factors, such as the nature of transforming DNA (e.g., linearized DNA or intact supercoiled 
plasmids) and the manipulation of cells before and immediately after bombardment, may 
affect transformation optimization. It is believed that pre-bombardment manipulations are 

25 especially important for successful transformation of immature embryos. One may also 
minimize the trauma reduction factors by modifying conditions that influence the 
physiological state of the recipient cells and which may therefore influence transformation 
and integration efficiencies. For example, the osmotic state, tissue hydration and the 
subculture stage or cell cycle of the recipient cells may be adjusted for optimum 

30 transformation. 

Agrobacterium-mzdrdted transfer is a widely applicable system for introducing genes 
into plant cells because the DNA can be introduced into whole plant tissues, thereby 
bypassing the need for regeneration of an intact plant from a protoplast. Further, the 
integration of the Ti-DNA is a relatively precise process resulting in few rearrangements. 



40 



WO 02/12478 



PCT7US0 1/24335 



The region of DNA to be transferred is defined by the border sequences and intervening 
DNA is usually inserted into the plant genome as described (Spielmann et al. s 1986). 

Modem Agrobacterium transformation vectors are capable of replication in E. coli as 
well as Agrobacterium, allowing for convenient manipulations. Moreover, technological 
5 advances in vectors for Agrobacterium-medizted gene transfer have improved the 

arrangement of genes and restriction sites in the vectors to facilitate construction of vectors 
capable of expressing various polypeptide coding genes. Available vectors have convenient 
multi-linker regions flanked by a promoter and a polyadenylation site for direct expression of 
inserted polypeptide coding genes and are suitable for present purposes. In addition, 
1 0 Agrobacterium containing both armed and disarmed Ti genes can be used for the 

transformations. In those plant strains where Agrobacteriwn-medizted transformation is 
efficient, it is the method of choice because of the facile and defined nature of the gene 
transfer. 

A transgenic plant formed using Agrobacterium transformation methods typically 
1 5 contains a single gene on one chromosome. Such transgenic plants can be referred to as 
being heterozygous for the added gene. More preferred is a transgenic plant that is 
homozygous for the added structural gene; i.e., a transgenic plant that contains two added 
genes, one gene at the same locus on each chromosome of a chromosome pair. A 
homozygous transgenic plant can be obtained by sexually mating (selfing) an independent 
20 segregant, transgenic plant that contains a single added gene, germinating some of the seed 
produced and analyzing the resulting plants produced for the gene of interest. 

Transgenic Plants 

Regeneration, development, and cultivation of plants from single plant protoplast 
transformants or various transformed explants is taught in the art, e.g., by Weissbach and 

25 Weissbach (eds.), Methods for Plant Molecular Biology, Academic Press, Inc., San Diego, 

CA (1988); and Horsch et al, Science 227:1229-1231 (1985). There are a variety of methods 
for the regeneration of plants from plant tissue. The particular method of regeneration will 
depend on the starting plant tissue and the particular plant species to be regenerated. 

Transformants are generally cultured in the presence of a selective media that selects 

30 for the successfully transformed cells and induces the regeneration of plant shoots. Such 
shoots are typically obtained within two to four months. Shoots are then transferred to an 
appropriate root-inducing medium containing the selective agent and an antibiotic to prevent 
bacterial growth. Many of the shoots will develop roots, which are then transplanted to soil 
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or other media to allow the continued development of roots. The method, as outlined, will 
generally vary depending on the particular plant employed. 

Preferably, the regenerated transgenic plants are self-pollinated to provide 
homozygous transgenic plants. Alternatively, pollen obtained from the regenerated 
5 transgenic plants may be crossed with seed-grown or non-transgenic plants, preferably plants 
of agronomically important lines. Conversely, pollen from seed-grown or non-transgenic 
plants may be used to pollinate the regenerated transgenic plants. A transgenic plant of the 
invention containing a desired polypeptide is cultivated using methods well-known to one 
skilled in the art. 

10 A transgenic plant may pass along the nucleic acid sequence encoding the enhanced 

gene expression to its progeny. The transgenic plant is preferably homozygous for the 
nucleic acid encoding the enhanced gene expression and transmits that sequence to all of its 
offspring upon as a result of sexual reproduction. Progeny may be grown from seeds 
produced by the transgenic plant. These additional plants may then be self-pollinated to 

1 5 generate a true breeding line of plants. 

It is also to be understood that two different transgenic plants can also be mated to 
produce offspring that contain two independently segregating, exogenous genes. Selfing of 
appropriate progeny can produce plants that are homozygous for both added, exogenous 
genes that encode a polypeptide of interest. Back-crossing to a parental plant and out- 

20 crossing with a non-transgenic plant are also contemplated, as is vegetative propagation. 

The progeny from these plants are evaluated, among other things, for gene 
expression. The gene expression may be detected by several common methods such as 
western blotting, northern blotting, immunoprecipitation, and ELISA. Assays for gene 
expression based on the transient expression of cloned nucleic acid constructs have been 

25 developed by introducing the nucleic acid molecules into plant cells by polyethylene glycol 
treatment, electroporation, or particle bombardment. Transient expression systems may be 
used to functionally dissect gene constructs {see generally, Maliga et al., Methods in Plant 
Molecular Biology, A Laboratory Course Manual, Cold Spring Harbor Press, Cold Spring 
Harbor, New York, 1 995). 

30 Any of the nucleic acid molecules of the invention may be introduced into a plant 

cell in a permanent or transient manner in combination with other genetic elements such as 
vectors, promoters, enhancers, etc. Further, any of the nucleic acid molecules of the 
invention may be introduced into a plant cell in a manner that allows for expression or 
overexpression of the protein or fragment thereof encoded by the nucleic acid molecule, for 

35 cosuppression of an endogenous protein, or for postranscriptional gene silencing of an 
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endogenous transcript. In addition, the activity of a protein in a plant cell may be reduced or 
depressed by growing a transgenic plant cell containing a nucleic acid molecule whose non- 
transcribed strand encodes a protein or fragment thereof. 

Cosuppression is the reduction in expression levels, usually at the level of RNA, of a 
5 particular endogenous gene or gene family by the expression of a homologous sense 

construct that is capable of transcribing mRNA of the same strandedness as the transcript of 
the endogenous gene. Cosuppression may result from stable transformation with a single 
copy nucleic acid molecule that is homologous to a nucleic acid sequence found with the cell 
or with multiple copies of a nucleic acid molecule that is homologous to a nucleic acid 

1 0 sequence found with the cell. Genes, even though different, linked to homologous promoters 
may result in the cosuppression of the linked genes. 

Posttranscriptional gene silencing (PTGS) can result in virus immunity or gene 
silencing in plants. PTGS is induced by dsRNA and is mediated by an RNA-dependent RNA 
polymerase, present in the cytoplasm, that requires a dsRNA template. The dsRNA is formed 

15 by hybridization of complementary transgene mRNAs or complementary regions of the same 
transcript. Duplex formation can be accomplished by using transcripts from one sense gene 
and one antisense gene colocated in the plant genome, a single transcript that has self- 
complementarity, or sense and antisense transcripts from genes brought together by crossing. 
The dsRNA-dependent RNA polymerase makes a complementary strand from the transgene 

20 mRNA and RNAse molecules attach to this complementary strand (cRNA). These cRNA- 
RNase molecules hybridize to the endogene mRNA and cleave the single-stranded RNA 
adjacent to the hybrid. The cleaved single-stranded RNAs are further degraded by other host 
RNases because one will lack a capped 5' end and the other will lack a poly(A) tail. See 
Waterhouse et at., PNAS 95: 13959-13964 (1998). 

25 Antisense approaches are a way of preventing or reducing gene function by targeting 

the genetic material. The objective of the antisense approach is to use a sequence 
complementary to the target gene to block its expression and create a mutant cell line or 
organism in which the level of a single chosen protein is selectively reduced or abolished. 
Antisense techniques have several advantages over other Reverse genetic' approaches. The 

30 site of inactivation and its developmental effect can be manipulated by the choice of 
promoter for antisense genes or by the timing of external application or microinjection. 
Antisense can manipulate its specificity by selecting either unique regions of the target gene 
or regions where it shares homology to other related genes. 

Under one embodiment, the process involves the introduction and expression of an 

35 antisense gene sequence. Such a sequence is one in which part or all of the normal gene 
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sequences are placed under a promoter in inverted orientation so that the Svrong 5 or 
complementary strand is transcribed into a noncoding antisense RNA that hybridizes with the 
target mRNA and interferes with its expression. An antisense vector can be constructed by 
standard procedures and introduced into cells by transformation, transfection, 
5 electroporation, microinjection, infection, etc. The type of transformation and choice of 
vector will determine whether expression is transient or stable. The promoter used for the 
antisense gene may influence the level, timing, tissue, specificity, or inducibility of the 
antisense inhibition. 

Feed. Meal. Protein a nd Oil Pre parations 

i 

1 0 Plants or agents of the present invention can be utilized in methods, for example 

without limitation, to obtain a seed that expresses a gcpE nucleic acid molecule in that seed, 
to obtain a seed enhanced in a product of a gcpE gene, to obtain meal enhanced in a product 
of a gcpE gene, to obtain feedstock enhanced in a product of a gcpE gene, and to obtain oil 
enhanced in a product of a gcpE gene. 

1 5 The present invention also provides for parts of the plants, particularly reproductive 

or storage parts, of the present invention. Plant parts, without limitation, include seed, 
endosperm, mesocarp, ovule and pollen. In a particularly preferred embodiment of the 
present invention, the plant part is a seed. In one embodiment the seed is a constituent of 
animal feed. In another embodiment, the plant part is a fruit, more preferably a fruit with 

20 enhanced shelf life. In another preferred embodiment, the fruit has increased levels of a 
tocopherol. 

Plants utilized in such methods may be processed. A plant or plant part may be 
separated or isolated from other plant parts. A preferred plant part for this purpose is a seed. 
It is understood that even after separation or isolation from other plant parts, the isolated or 

it 

25 separated plant part may be contaminated with other plant parts. In a preferred aspect, the 
separated plant part is greater than about 50% (w/w) of the separated material, more 
preferably, greater than about 75% (w/w) of the separated material, and even more preferably 
greater than about 90% (w/w) of the separated material. Plants or plant parts of the present 
invention generated by such methods may be processed into products using known 

30 techniques. 

Preferred products are meal, feedstock and oil. Methods to produce feed, meal, 
protein and oil preparations are known in the art. See, e.g., U.S. Patents 4,957,748, 
5,100,679, 5,219,596, 5,936,069, 6,005,076, 6,146,669, and 6,156,227. In a preferred 
embodiment, the protein preparation is a high protein preparation. Such a high protein 
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preparation preferably has a protein content of greater than about 5% w/v, more preferably 
about 10% w/v, and even more preferably about 15% w/v. 

In a preferred embodiment, the oil preparation is a high oil preparation with an oil 
content derived from a plant or part thereof of the present invention of greater than about 5% 
5 w/v, more preferably greater than about 10% w/v, and even more preferably greater than 
about 15% w/v. In a preferred embodiment the oil preparation is a liquid and of a volume 
greater than about 1, 5, 10 or 50 liters. The present invention provides for oil produced from 
plants of the present invention or generated by a method of the present invention. Such oil 
may be a minor or major component of any resultant product. Moreover, such oil may be 

1 0 blended with other oils. 

In a preferred embodiment, the oil produced from plants of the present invention or 
generated by a method of the present invention constitutes greater than about 0.5%, 1%, 5%, 
10%, 25%, 50%, 75% or 90% by volume or weight of the oil component of any product. In 
another embodiment, the oil preparation may be blended and can constitute greater than 

15 about 10%, 25%, 35%, 50% or 75% of the blend by volume. Oil produced from a plant of 
the present invention can be admixed with one or more organic solvents or petroleum 
distillates. 

Seed containers 

Seeds of the plants may be placed in a container. As used herein, a container is any 
20 object capable of holding such seeds. A container preferably contains greater than about 500, 
1,000, 5,000, or 25,000 seeds where at least about 10%, 25%, 50%, 75% or 100% of the 
seeds are derived from a plant of the present invention. The present invention also provides a 
container of over about 10,000, more preferably about 20,000, and even more preferably 
about 40,000 seeds where over about 10%, more preferably about 25%, more preferably 50% 
25 and even more preferably about 75% or 90% of the seeds are seeds derived from a plant of 
the present invention. The present invention also provides a container of over about 10 kg, 
more preferably about 25 kg, and even more preferably about 50 kg seeds where over about 
10%, more preferably about 25%, more preferably about 50% and even more preferably 
about 75% or 90% of the seeds are seeds derived from a plant of the present invention. 

30 K Antibodies 

One aspect of the invention concerns antibodies, single-chain antigen binding 
molecules, or other proteins that specifically bind to one or more of the protein or peptide 
molecules of the invention and their homologs, fusions or fragments. In a particularly 
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preferred embodiment, the antibody specifically binds to a protein having the amino acid 
sequence set forth in SEQ ID NOs: 4, 48, 49 and 50, or an amino acid sequence encoded by a 
nucleic acid sequence selected from the group consisting of SEQ ID NOs: 1 through 3 and 5 
through 47. Such antibodies may be used to quantitatively or qualitatively detect the protein 
5 or peptide molecules of the invention. 

Nucleic acid molecules that encode all or part of the protein of the invention can be 
expressed, via recombinant means, to yield protein or peptides that can in turn be used to 
elicit antibodies that are capable of binding the expressed protein or peptide. Such antibodies 
may be used in immunoassays for that protein. Such protein-encoding molecules, or their 

10 fragments may be a "fusion" molecule (i.e., a part of a larger nucleic acid molecule) such 
that, upon expression, a fusion protein is produced. It is understood that any of the nucleic 
acid molecules of the invention may be expressed, via recombinant means, to yield proteins 
or peptides encoded by these nucleic acid molecules. 

The antibodies that specifically bind proteins and protein fragments of the invention 

1 5 may be polyclonal or monoclonal and may comprise intact immunoglobulins, or antigen 
binding portions of immunoglobulins fragments (such as (F(ab') 5 F(ab') 2 ), or single-chain 
immunoglobulins producible, for example, via recombinant means. It is understood that 
practitioners are familiar with the standard resource materials that describe specific 
conditions and procedures for the construction, manipulation and isolation of antibodies (see, 

20 e.g., Harlow and Lane, in: Antibodies: A Laboratory Manual, Cold Spring Harbor Press, 
Cold Spring Harbor, New York, 1988). 

As discussed below, such antibody molecules or their fragments may be used for 
diagnostic purposes. Where the antibodies are intended for diagnostic purposes, it may be 
desirable to derivatize them, for example with a ligand group (such as biotin) or a detectable 

25 marker group (such as a fluorescent group, a radioisotope or an enzyme). 

The ability to produce antibodies that bind the protein or peptide molecules of the 
invention permits the identification of mimetic compounds derived from those molecules. 
These mimetic compounds may contain a fragment of the protein or peptide or merely a 
structurally similar region and nonetheless exhibits an ability to specifically bind to 

30 antibodies directed against that compound. 

Antibodies have been expressed in plants. Cytoplasmic expression of a scFv (single- 
chain Fv antibody) has been reported to delay infection by artichoke mottled crinkle virus. 
Transgenic plants that express antibodies directed against endogenous proteins may exhibit a 
physiological effect. For example, expressed anti-abscisic antibodies have been reported to 

35 result in a general perturbation of seed development. See, e.g., Hiatt et al, Nature 342:76-78 



46 



WO 02/12478 



PCT7US0 1/24335 



(1989); Conrad and Fielder, Plant Mol Biol 26:1023-1030 (1994); Philips etal, EMBO J. 
16:4489-4496 (1997); Marion-Poll, Trends in Plant Science 2:447-448 (1997). 

Antibodies that are catalytic may also be expressed in plants (abzymes). The 
principle behind abzymes is that because antibodies may be raised against many molecules, 
5 this recognition ability can be directed toward generating antibodies that bind transition 
states to force a chemical reaction forward. Persidas, Nature Biotechnology 15:1313-1315 
(1997); Baca et at, Ann. Rev. Biophys. Biomol Struct. 26:461-493 (1997). The catalytic 
abilities of abzymes may be enhanced by site directed mutagenesis. Examples of abzymes 
are, for example, set forth in U.S. Patent Nos. 5,658,753; 5,632,990; 5,631,137; 5,602,015; 
10 5,559,538; 5,576,174; 5,500,358; 5,318,897; 5,298,409; 5,258,289; and 5,194,585. It is 
understood that any of the antibodies of the invention may be expressed in plants and that 
such expression can result in a physiological effect. It is also understood that any of the 
expressed antibodies may be catalytic. 

F. Markers 

1 5 Another subset of the nucleic acid molecules of the invention includes nucleic acid 

molecules that are markers. The markers can be used in a number of ways in the field of 
molecular genetics. Such markers include nucleic acid molecules SEQ ID NOs: 1 through 3 
and 5 through 47 or complements thereof or fragments of either that can act as markers and 
other nucleic acid molecules of the present invention that can act as markers. 

20 Genetic markers of the invention include "dominant" or "codominant" markers. 

"Codominant markers" reveal the presence of two or more alleles (two per diploid 
individual) at a locus. "Dominant markers" reveal the presence of only a single allele per 
locus. The presence of the dominant marker phenotype (e.g., a band of DNA) is an 
indication that one allele is in either the homozygous or heterozygous condition. The 

25 absence of the dominant marker phenotype (e.g., absence of a DNA band) is merely evidence 
that "some other" undefined allele is present. In the case of populations where individuals 
are predominantly homozygous and loci are predominately dimorphic, dominant and 
codominant markers can be equally valuable. As populations become more heterozygous 
and multi-allelic, codominant markers often become more informative of the genotype than 

30 dominant markers. Marker molecules can be, for example, capable of detecting 
polymorphisms such as single nucleotide polymorphisms (SNPs). 

The genomes of animals and plants naturally undergo spontaneous mutation in the 
course of their continuing evolution. A "polymorphism" is a variation or difference in the 
sequence of the gene or its flanking regions that arises in some of the members of a species. 
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The variant sequence and the "original" sequence co-exist in the species' population. In 
some instances, such co-existence is in stable or quasi-stable equilibrium. 

A polymorphism is thus said to be "allelic/' in that, due to the existence of the 
polymorphism, some members of a species may have the original sequence (i.e., the original 
5 "allele") whereas other members may have the variant sequence (i.e., the variant "allele"). In 
the simplest case, only one variant sequence may exist and the polymorphism is thus said to 
be di-allelic. In other cases, the species' population may contain multiple alleles and the 
polymorphism is termed tri-allelic, etc. A single gene may have multiple different unrelated 
polymorphisms. For example, it may have a di-allelic polymorphism at one site and a multi- 

1 0 allelic polymorphism at another site. 

The variation that defines the polymorphism may range from a single nucleotide 
variation to the insertion or deletion of extended regions within a. gene. In some cases, the 
DNA sequence variations are in regions of the genome that are characterized by short tandem 
repeats (STRs) that include tandem di- or tri-nucleotide repeated motifs of nucleotides. 

1 5 Polymorphisms characterized by such tandem repeats are referred to as "variable number 
tandem repeat" (VNTR) polymorphisms. VNTRs have been used in identity analysis (EP 
370719; U.S. Patent Nos. 5,075,217 and 5,175,082; WO 91/14003). 

The detection of polymorphic sites in a sample of DNA may be facilitated through 
the use of nucleic acid amplification methods. Such methods specifically increase the 

20 concentration of polynucleotides that span the polymorphic site, or include that site and 

sequences located either distal or proximal to it. Such amplified molecules can be readily 
detected by gel electrophoresis or other means. 

In an alternative embodiment, such polymorphisms can be detected through the use 
of a marker nucleic acid molecule that is physically linked to such polymorphism(s). For this 

25 purpose, marker nucleic acid molecules comprising a nucleotide sequence of a 

polynucleotide located within 1 mb of the polymorphism(s) and more preferably within 
lOOkb of the polymorphism(s) and most preferably within lOkb of the polymorphism(s) can 
be employed. Alternatively, marker nucleic acid molecules comprising a nucleotide 
sequence of a polynucleotide located within 25 cM of the polymorphism(s) and more 

30 preferably within 15 cM of the polymorphism(s) and most preferably within 5 cM of the 
polymorphism(s) can be employed. 

The identification of a polymorphism can be determined in a variety of ways. By 
correlating the presence or absence of it in a plant with the presence or absence of a 
phenotype, it is possible to predict the phenotype of that plant. If a polymorphism creates or 

35 destroys a restriction endonuclease cleavage site, or if it results in the loss or insertion of 
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DNA (e.g., a VNTR polymorphism), it will alter the size or profile of the DNA fragments 
that are generated by digestion with that restriction endonuclease. As such, organisms that 
possess a variant sequence can be distinguished from those having the original sequence by 
restriction fragment analysis. Polymorphisms that can be identified in this manner are 
5 termed "restriction fragment length polymorphisms" (RFLPs) (UK Patent Application 
2135774; WO 90/13668; WO 90/11369). 

Polymorphisms can also be identified by Single Strand Conformation Polymorphism 
(SSCP) analysis, random amplified polymorphic DNA (RAPD), and cleaveable amplified 
polymorphic sequences (CAPS). See, e.g., Lee et al, Anal. Biochem. 205:289-293 (1992); 

10 Sarkar et al, Genomics 13:441-443 (1992); Williams et al, Nucl Acids Res. 18:6531-6535 
(1990); and Lyamichev et al, Science 260:778-783 (1993). It is understood that one or more 
of the nucleic acids of the invention, may be utilized as markers or probes to detect 
polymorphisms by SSCP, RAPD or CAPS analysis. 

Polymorphisms may also be found using a DNA fingerprinting technique called 

1 5 amplified fragment length polymorphism (AFLP), which is based on the selective PCR 
amplification of restriction fragments from a total digest of genomic DNA to profile that 
DNA. Vos et al, Nucleic Acids Res. 23:4407-4414 (1995). This method allows for the 
specific co-amplification of high numbers of restriction fragments, which can be visualized 
by PCR without knowledge of the nucleic acid sequence. It is understood that one or more of 

20 the nucleic acids of the invention may be utilized as markers or probes to detect 
polymorphisms by AFLP analysis or for fingerprinting RNA. 

Single Nucleotide Polymorphisms (SNPs) generally occur at greater frequency than 
other polymorphic markers and are spaced with a greater uniformity throughout a genome 
than other reported forms of polymorphism. The greater frequency and uniformity of SNPs 

25 means that there is greater probability that such a polymorphism will be found near or in a 
genetic locus of interest than would be the case for other polymorphisms. SNPs are located 
in protein-coding regions and noncoding regions of a genome. Some of these SNPs may 
result in defective or variant protein expression (e.g., as a result of mutations or defective 
splicing). Analysis (genotyping) of characterized SNPs can require only a plus/minus assay 

30 rather than a lengthy measurement, permitting easier automation. 

SNPs can be characterized using any of a variety of methods. Such methods include 
the direct or indirect sequencing of the site, the use of restriction enzymes, enzymatic and 
chemical mismatch assays, allele-specific PCR, ligase chain reaction, single-strand 
conformation polymorphism analysis, single base primer extension (U.S. Patent Nos. 

35 6,004,744 and 5,888,819), solid-phase ELISA-based oligonucleotide ligation assays, dideoxy 
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fingerprinting, oligonucleotide fluorescence-quenching assays, 5 -nuclease allele-specific 
hybridization TaqMan™ assay, template-directed dye-terminator incorporation (TDI) assay 
(Chen and Kwok, NucL Acids Res. 25:347-353, 1997), allele-specific molecular beacon assay 
(Tyagi et ah, Nature Biotech. 16: 49-53, 1998), PinPoint assay (Haff and Smirnov, Genome 
5 Res. 7: 378-388, 1997), dCAPS analysis (Neff et al, Plant J. 14:387-392, 1998), 

pyrosequencing (Ronaghi et al, Analytical Biochemistry 267:65-71, 1999; WO 98/13523; 
WO 98/28440; and www.pyrosequencing.com), using mass spectrometry, e.g. the Masscode 
™ system (WO 99/05319; WO 98/26095; WO 98/12355; WO 97/33000; WO 97/27331; 
www.rapigene.com; and U.S. Patent No. 5,965,363), invasive cleavage of oligonucleotide 

10 probes, and using high density oligonucleotide arrays (Hacia et al., Nature Genetics 22: 164- 
167; www.affymetrix.com). 

Polymorphisms may also be detected using allele-specific oligonucleotides (ASO), 
which, can be for example, used in combination with hybridization based technology 
including Southern, northern, and dot blot hybridizations, reverse dot blot hybridizations and 

1 5 hybridizations performed on microarray and related technology. 

The stringency of hybridization for polymorphism detection is highly dependent 
upon a variety of factors, including length of the allele-specific oligonucleotide, sequence 
composition, degree of complementarity (i.e. presence or absence of base mismatches), 
concentration of salts and other factors such as formamide, and temperature. These factors 

20 are important both during the hybridization itself and during subsequent washes performed to 
remove target polynucleotide that is not specifically hybridized. In practice, the conditions 
of the final, most stringent wash are most critical. In addition, the amount of target 
polynucleotide that is able to hybridize to the allele-specific oligonucleotide is also governed 
by such factors as the concentration of both the ASO and the target polynucleotide, the 

25 presence and concentration of factors that act to "tie up" water molecules, so as to effectively 
concentrate the reagents (e.g., PEG, dextran, dextran sulfate, etc.), whether the nucleic acids 
are immobilized or in solution, and the duration of hybridization and washing steps. 

Hybridizations are preferably performed below the melting temperature (T^ of the 
ASO. The closer the hybridization and/or washing step is to the T m , the higher the 

30 stringency. T m for an oligonucleotide may be approximated, for example, according to the 
following formula: T ra = 81.5 + 16.6 x (loglO[Na+]) + 0.41 x (%G+C) - 675/n; where [Na+] 
is the molar salt concentration of Na+ or any other suitable cation and n = number of bases in 
the oligonucleotide. Other formulas for approximating T m are available and are known to 
those of ordinary skill in the art. 
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Stringency is preferably adjusted so as to allow a given ASO to differentially 
hybridize to a target polynucleotide of the correct allele and a target polynucleotide of the 
incorrect allele. Preferably, there will be at least a two-fold differential between the signal 
produced by the ASO hybridizing to a target polynucleotide of the correct allele and the level 
5 of the signal produced by the ASO cross-hybridizing to a target polynucleotide of the 

incorrect allele (e.g., an ASO specific for a mutant allele cross-hybridizing to a wild-type 
allele). In more preferred embodiments of the present invention, there is at least a five-fold 
signal differential. In highly preferred embodiments of the present invention, there is at least 
an order of magnitude signal differential between the ASO hybridizing to a target 
10 polynucleotide of the correct allele and the level of the signal produced by the ASO cross- 
hybridizing to a target polynucleotide of the incorrect allele. While certain methods for 
detecting polymorphisms are described herein, other detection methodologies may be 
utilized. 

The present invention includes and provides a method for detecting a polymorphism 

15 in a plant whose presence is predictive of a mutation affecting a level or pattern of a protein 
comprising: (A) incubating under conditions permitting nucleic acid hybridization: (i) a 
marker nucleic acid molecule having a nucleic acid sequence that hybridizes to a sequence 
selected from the group consisting of SEQ ID NOs: 1 through 3, 5 through 47, and 
complements thereof; and (ii) a complementary nucleic acid molecule obtained from a 

20 sample, wherein nucleic acid hybridization between the marker nucleic acid molecule and the 
complementary nucleic acid molecule permits the detection of a polymorphism; (B) 
permitting hybridization between the marker nucleic acid molecule and the complementary 
nucleic acid molecule; and (C) detecting the presence of the polymorphism, wherein the 
detection of the polymorphism is predictive of the mutation. 

25 The present invention includes and provides a method of determining a degree of 

association between a polymorphism and a plant trait comprising: (A) hybridizing a nucleic 
acid molecule specific for the polymorphism to genetic material of a plant, wherein the 
nucleic acid molecule has a sequence selected from the group consisting of SEQ ID NOs: 1 
through 3, 5 through 47, complements thereof, and fragments of these sequences; and (B) 

30 calculating the degree of association between the polymorphism and the plant trait. 

The present invention includes and provides a method of isolating a nucleic acid that 
encodes a protein or fragment thereof comprising: (A) incubating under conditions 
permitting nucleic acid hybridization: (i) a first nucleic acid molecule comprising a sequence 
selected from the group consisting of SEQ ED NOs: 1 through 3, 5 through 47, complements 

35 thereof, and fragments of these sequences; and (ii) a complementary second nucleic acid 
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molecule obtained from a plant cell or plant tissue; (B) permitting hybridization between the 
first nucleic acid molecule and the second nucleic acid molecule obtained from the plant cell 
or plant tissue; and (C) isolating the second nucleic acid molecule. 

G. Plant Breeding 

i 

5 Plants of the present invention can be part of or generated from a breeding program. 

The choice of breeding method depends on the mode of plant reproduction, the heritability of 
the trait(s) being improved, and the type of cultivar used commercially (e.g., ¥ 1 hybrid 
cultivar, pureline cultivar, etc). Selected, non-limiting approaches, for breeding the plants of 
the present invention are set forth below. A breeding program can be enhanced using marker 
10 assisted selection of the progeny of any cross. It is further understood that any commercial 
and non-commercial cultivars can be utilized in a breeding program. Factors such as, for 
example, emergence vigor, vegetative vigor, stress tolerance, disease resistance, branching, 
flowering, seed set, seed size, seed density, standability, and threshability etc. will generally 
dictate the choice. 

1 5 For highly heritable traits, a choice of superior individual plants evaluated at a single 

location will be effective, whereas for traits with low heritability, selection should be based 
on mean values obtained from replicated evaluations of families of related plants. Popular 
selection methods commonly include pedigree selection, modified pedigree selection, mass 
selection, and recurrent selection. In a preferred embodiment a backcross or recurrent 

20 breeding program is undertaken. 

The complexity of inheritance influences choice of the breeding method. Backcross 
breeding can be used to transfer one or a few favorable genes for a highly heritable trait into 
a desirable cultivar. This approach has been used extensively for breeding disease-resistant 
cultivars. Various recurrent selection techniques are used to improve quantitatively inherited 

25 traits controlled by numerous genes. The use of recurrent selection in self-pollinating crops 
depends on the ease of pollination, the frequency of successful hybrids from each pollination, 
and the number of hybrid offspring from each successful cross. 

Breeding lines can be tested and compared to appropriate standards in environments 
representative of the commercial target area(s) for two or more generations. The best lines 

30 are candidates for new commercial cultivars; those still deficient in traits may be used as 
parents to produce new populations for further selection. 

One method of identifying a superior plant is to observe its performance relative to 

* 

other experimental plants and to a widely grown standard cultivar. If a single observation is 
inconclusive, replicated observations can provide a better estimate of its genetic worth. A 
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breeder can select and cross two or more parental lines, followed by repeated selfing and 
selection, producing many new genetic combinations. 

The development of new cultivars requires the development and selection of 
varieties, the crossing of these varieties and the selection of superior hybrid crosses. The 
5 hybrid seed can be produced by manual crosses between selected male-fertile parents or by 
using male sterility systems. Hybrids are selected for certain single gene traits such as pod 
color, flower color, seed yield, pubescence color, or herbicide resistance, which indicate that 
the seed is truly a hybrid. Additional data on parental lines, as well as the phenotype of the 
hybrid, influence the breeder's decision whether to continue with the specific hybrid cross. 

10 Pedigree breeding and recurrent selection breeding methods can be used to develop 

cultivars from breeding populations. Breeding programs combine desirable traits from two 
or more cultivars or various broad-based sources into breeding pools from which cultivars 
are developed by selfing and selection of desired phenotypes. New cultivars can be 
evaluated to determine which have commercial potential. 

15 Pedigree breeding is used commonly for the improvement of self-pollinating crops. 

Two parents who possess favorable, complementary traits are crossed to produce an F^ An 
F 2 population is produced by selfing one or several F/s. Selection of the best individuals 
from the best families is carried out. Replicated testing of families can begin in the F 4 
generation to improve the effectiveness of selection for traits with low heritability. At an 

20 advanced stage of inbreeding (i.e., F 6 and F 7 ), the best lines or mixtures of phenotypically 
similar lines are tested for potential release as new cultivars. 

Backcross breeding has been used to transfer genes for a simply inherited., highly 
heritable trait into a desirable homozygous cultivar or inbred line, which is the recurrent 
parent. The source of the trait to be transferred is called the donor parent. The resulting 

25 plant is expected to have the attributes of the recurrent parent (e.g., cultivar) and the desirable 
trait transferred from the donor parent. After the initial cross, individuals possessing the 

i 

phenotype of the donor parent are selected and repeatedly crossed (backcrossed) to the 
recurrent parent. The resulting parent is expected to have the attributes of the recurrent 
parent (e.g., cultivar) and the desirable trait transferred from the donor parent. 

30 The single-seed descent procedure in the strict sense refers to planting a segregating 

population, harvesting* a sample of one seed per plant, and using the one-seed sample to plant 
the next generation. When the population has been advanced from the F 2 to the desired level 
of inbreeding, the plants from which lines are derived will each trace to different F 2 
individuals. The number of plants in a population declines each generation due to failure of 

35 some seeds to germinate or some plants to produce at least one seed. As a result, not all of 
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the F 2 plants originally sampled in the population will be represented by a progeny when 
generation advance is completed. 

In a multiple-seed procedure, breeders commonly harvest one or more pods from 
each plant in a population and thresh them together to form a bulk. Part of the bulk is used to 
5 plant the next generation and part is put in reserve. The procedure has been referred to as 
modified single-seed descent or the pod-bulk technique. The multiple-seed procedure has 
been used to save labor at harvest. It is considerably faster to thresh pods with a machine 
than to remove one seed from each by hand for the single-seed procedure. The multiple-seed 
procedure also makes it possible to plant the same number of seed of a population each 

1 0 generation of inbreeding. 

Descriptions of other breeding methods that are commonly used for different traits 
and crops can be found in one of several reference books (e.g., Fehr, Principles of Cultivar 
Development, Vol. 1 (1987). 

A transgenic plant of the present invention may also be reproduced using apomixis. 

1 5 Apomixis is a genetically controlled method of reproduction in plants where the embryo is 
formed without union of an egg and a sperm. There are three basic types of apomictic 
reproduction: 1) apospory where the embryo develops from a chromosornally unreduced egg 
in an embryo sac derived from the nucleus, 2) diplospory where the embryo develops from 
an unreduced egg in an embryo sac derived from the megaspore mother cell, and 

20 3) adventitious embryony where the embryo develops directly from a somatic cell. In most 
forms of apomixis, pseudogamy or fertilization of the polar nuclei to produce endosperm is 
necessary for seed viability. In apospory, a nurse cultivar can be used as a pollen source for 
endosperm formation in seeds. The nurse cultivar does not affect the genetics of the 
aposporous apomictic cultivar because the unreduced egg of the cultivar develops 

25 parthenogenetically, but makes possible endosperm production. Apomixis is economically 
important, especially in transgenic plants, because it causes any genotype, no matter how 
heterozygous, to breed true. Thus, with apomictic reproduction, heterozygous transgenic 
plants can maintain their genetic fidelity throughout repeated life cycles. Methods for the 
'production of apomictic plants are known in the art. See, e.g., U.S. Patent No. 5,81 1,636. 

30 Requirements for marker-assisted selection in a plant breeding program are: (1) the 

marker(s) should co-segregate or be closely linked with the desired trait; (2) an efficient 
means of screening large populations for the molecular marker(s) should be available; and 
(3) the screening technique should have high reproducibility across laboratories and 
preferably be economical to use and be user-friendly. 
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The genetic linkage of marker molecules can be established by a gene mapping 
model such as, without limitation, the flanking marker model reported by Lander and 
Botstein, Genetics 121:185-199 (1989), and the interval mapping model, based on maximum 
likelihood methods described by Lander and Botstein, and implemented in the software 
5 package MAPMAKER/QTL (Lincoln and Lander, Mapping Genes Controlling Quantitative 
Traits Using MAPMAKER/QTL, Whitehead Institute for Biomedical Research, 
Massachusetts, 1990). Additional software includes Qgene, Version 2.23 (1996), 
Department of Plant Breeding and Biometry, 266 Emerson Hall, Cornell University, Ithaca, 
NY). Use of Qgene software is a particularly preferred approach. 
10 A maximum likelihood estimate (MLE) for the presence of a marker is calculated, 

together with an MLE assuming no QTL effect, to avoid false positives. A log 10 of an odds 
ratio (LOD) is then calculated as: LOD = log 10 (MLE for the presence of a QTL/MLE given 
no linked QTL). 

The LOD score essentially indicates how much more likely the data are to have 

1 5 arisen assuming the presence of a QTL than in its absence. The LOD threshold value for 
avoiding a false positive with a given confidence, say 95%, depends on the number of 
markers and the length of the genome. Graphs indicating LOD thresholds are set forth in 
Lander and Botstein, supra, and further described by Arus and Moreno-Gonzalez, Plant 
Breeding, (Hayward et ah, eds.) Chapman & Hall, London, pp. 314-331 (1993). 

20 In a preferred embodiment of the present invention the nucleic acid marker exhibits a 

LOD score of greater than about 2.0, more preferably about 2.5, even more preferably greater 
than about 3.0 or 4.0 with the trait or phenotype of interest. In a preferred embodiment, the 
trait of interest is altered tocopherol levels or compositions. 

Additional models can be used. Many modifications and alternative approaches to 

25 interval mapping have been reported, including the use non-parametric methods. Kruglyak 
and Lander, Genetics 139:1421-1428 (1995). Multiple regression methods or models can be 
also be used, in which the trait is regressed on a large number of markers. Weber and Wricke, 
Advances in Plant Breeding, Blackwell, Berlin (1994). Procedures may combine interval 
mapping with regression analysis, whereby the phenotype is regressed onto a single putative 

30 QTL at a given marker interval and at the same time onto a number of markers that serve as 
'cofactors. 1 Generally, the use of cofactors reduces the bias and sampling error of the 
estimated QTL positions, thereby improving the precision and efficiency of QTL mapping. 
Zeng, Genetics 136:1457-1468 (1994). These models can be extended to multi-environment 
experiments to analyze genotype-environment interactions. Jansen et ah, Theo. Appl Genet. 

35 91:33-37(1995). 
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It is understood that one or more of the nucleic acid molecules of the invention may 
be used as molecular markers. It is also understood that one or more of the protein molecules 
of the invention may be used as molecular markers. 

In a preferred embodiment, the polymorphism is present and screened for in a 
5 mapping population, e.g. a collection of plants capable of being used with markers such as 
polymorphic markers to map genetic position of traits. The choice of appropriate mapping 
population often depends on the type of marker systems employed. Consideration must be 
given to the source of parents (adapted vs. exotic) used in the mapping population. 
Chromosome pairing and recombination rates can be severely disturbed (suppressed) in wide 
10 crosses (adapted x exotic) and generally yield greatly reduced linkage distances. Wide 
crosses will usually provide segregating populations with a relatively large number of 
polymorphisms when compared to progeny in a narrow cross (adapted x adapted). 

An F 2 population is the first generation of selfing (self-pollinating) after the hybrid 
seed is produced. Usually a single F! plant is selfed to generate a population segregating for 
15 all the genes in Mendelian (1:2:1) pattern. Maximum genetic information is obtained from a 

■ 

completely classified F 2 population using a codominant marker system (Mather, 1938). In 
the case of dominant markers, progeny tests (e.g., F 3? BCF 2 ) are required to identify the 
heterozygotes, in order to classify the population. However, this procedure is often 
prohibitive because of the cost and time involved in progeny testing. Progeny testing of F 2 

20 individuals is often used in map construction where phenotypes do not consistently reflect 
genotype (e.g. disease resistance) or where trait expression is controlled by a QTL. 
Segregation data from progeny test populations e.g. F 3 or BCF 2 ) can be used in map 
construction. Marker-assisted selection can then be applied to cross progeny based on 
marker-trait map associations (F 2? F 3 ), where linkage groups have not been completely 

25 disassociated by recombination events (i.e., maximum disequilibrium). 

Recombinant inbred lines (RIL) (genetically related lines; usually >F 5 , developed 
from continuously selfing F 2 lines towards homozygosity) can be used as a mapping 
population. Information obtained from dominant markers can be maximized by using RIL 
because all loci are homozygous or nearly so. Under conditions of tight linkage (i.e., about 

30 <10% recombination), dominant and co-dominant markers evaluated in RIL populations 
provide more information per individual than either marker type in backcross populations. 
However, as the distance between markers becomes larger (i.e., loci become more 
independent), the information in RIL populations decreases dramatically when compared to 
codominant markers. 
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Backcross populations (e.g., generated from a cross between a successful variety 
(recurrent parent) and another variety (donor parent) carrying a trait not present in the 
former) can be utilized as a mapping population. A series of backcrosses to the recurrent 
parent can be made to recover most of its desirable traits. Thus a population is created 
5 consisting of individuals nearly like the recurrent parent but each individual carries varying 
amounts or mosaic of genomic regions from the donor parent. Backcross populations can be 
useful for mapping dominant markers if all loci in the recurrent parent are homozygous and 
the donor and recurrent parent have contrasting polymorphic marker alleles. 

Information obtained from backcross populations using either codominant or 

10 dominant markers is less than that obtained from F 2 populations because one, rather than two, 
recombinant gamete is sampled per plant. Backcross populations, however, are more 
informative (at low marker saturation) when compared to RILs as the distance between 
linked loci increases in RIL populations (i.e. about .15% recombination). Increased 
recombination can be beneficial for resolution of tight linkages, but may be undesirable in 

1 5 the construction of maps with low marker saturation. 

Near-isogenic lines (NIL) (created by many backcrosses to produce a collection of 
individuals that is nearly identical in genetic composition except for the trait or genomic 
region under interrogation) can be used as a mapping population. In mapping with NILs, 
only a portion of the polymorphic loci is expected to map to a selected region. 

20 Bulk segregant analysis (BSA) is a method developed for the rapid identification of 

linkage between markers and traits of interest (Michelmore et aL, PNAS 88:9828-9832 
(1991). In BSA, two bulked DNA samples are drawn from a segregating population 
originating from a single cross. These bulks contain individuals that are identical for a 
particular trait (resistant or susceptible to particular disease) or genomic region but arbitrary 

25 at unlinked regions (i.e. heterozygous). Regions unlinked to the target region will not differ 
between the bulked samples of many individuals in BSA. 

H. Determining the Level of Expression Response 

In an aspect of the present invention, one or more of the nucleic molecules of the 
present invention are used to determine the level (i.e., the concentration of mRNA in a 
30 sample, etc) or pattern (i.e., the kinetics of expression, rate of decomposition, stability 

profile, etc.) of the expression of a protein encoded in part or whole by one or more of the 
nucleic acid molecule of the present invention (collectively, the "Expression Response" of a 
cell or tissue). 
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As used herein, the Expression Response manifested by a cell or tissue is said to be 
"altered" if it differs from the Expression Response of cells or tissues of plants not exhibiting 
the phenotype. To determine whether a Expression Response is altered, the Expression 
Response manifested by the cell or tissue of the plant exhibiting the phenotype is compared 
5 with that of a similar cell or tissue sample of a plant not exhibiting the phenotype. As will be 
appreciated, it is not necessary to re-determine the Expression Response of the cell or tissue 
sample of plants not exhibiting the phenotype each time such a comparison is made; rather, 
the Expression Response of a particular plant may be compared with previously obtained 
values of normal plants. 

10 A change in genotype or phenotype may be transient or permanent. Also as used 

herein, a tissue sample is any sample that comprises more than one cell. In a preferred 
aspect, a tissue sample comprises cells that share a common characteristic (e.g. derived from 
root, seed, flower, leaf, stem or pollen etc.). 

In one aspect of the present invention, an evaluation can be conducted to determine 

15 whether a particular mRNA molecule is present. One or more of the nucleic acid molecules 
of the present invention are utilized to detect the presence or quantity of the rnRNA species. 
Such molecules are then incubated with cell or tissue extracts of a plant under conditions 
sufficient to permit nucleic acid hybridization. The detection of double-stranded probe- 
mRNA hybrid molecules is indicative of the presence of the mRNA; the amount of such 

20 hybrid formed is proportional to the amount of mRNA. Thus, such probes may be used to 
ascertain the level and extent of the mRNA production in a plant's cells or tissues. Such 
nucleic acid hybridization may be conducted under quantitative conditions (thereby 
providing a numerical value of the amount of the mRNA present). Alternatively, the assay 
may be conducted as a qualitative assay that indicates either that the mRNA is present, or 

25 that its level exceeds a user set, predefined value. 

A number of methods can be used to compare the expression response between two 
or more samples of cells or tissue. These methods include hybridization assays, such as 
northerns, RNAse protection assays, and in situ hybridization. Alternatively, the methods 
include PCR-type assays. In a preferred method, the expression response is compared by 

30 hybridizing nucleic acids from the two or more samples to an array of nucleic acids. The 

array contains a plurality of suspected sequences known or suspected of being present in the 
cells or tissue of the samples. 

An advantage of in situ hybridization over more other techniques for the detection of 
nucleic acids is that it allows an investigator to determine the precise spatial population. In 

35 situ hybridization may be used to measure the steady-state level of RNA accumulation. A 
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number of protocols have been devised for in situ hybridization, each with tissue preparation, 

hybridization and washing conditions. 

In situ hybridization also allows for the localization of proteins within a tissue or 

cell. It is understood that one or more of the molecules of the invention, preferably one or 
5 more of the nucleic acid molecules or fragments thereof of the invention or one or more of 

the antibodies of the invention may be utilized to detect the level or pattern of a protein or 

niRNA thereof by in situ hybridization. 

Fluorescent in situ hybridization allows the localization of a particular DNA 

sequence along a chromosome, which is useful, among other uses, for gene mapping, 
1 0 following chromosomes in hybrid lines, or detecting chromosomes with translocations, 

transversions or deletions. In situ hybridization has been used to identify chromosomes in 

several plant species. It is understood that the nucleic acid molecules of the invention may 

be used as probes or markers to localize sequences along a chromosome. 

Another method to localize the expression of a molecule is tissue printing. Tissue 
1 5 printing provides a way to screen, at the same time on the same membrane many tissue 

sections from different plants or different developmental stages. See, e.g., Barres et al, 

Neuron 5:527-544 (1990); Cassab and Varner, J. Cell Biol 105:2581-2588 (1987); Harris 

and Chrispeels, Plant Physiol 56:292-299 (1975); Reid and Pont-Lezica, Tissue Printing: 

Tools for the Study of Anatomy, Histochemistry and Gene Expression, Academic Press, New 
20 York, New York (1992); Reid et al, Plant Physiol 93: 160-165 (1990); Spruce et al, 

Phytochemistry 26:2901-2903 (1987); Ye et al, Plant J. 1:175-183 (1991); Yomo and 

Taylor, Planta 112:35-43 (1973). 

A microarray-based method for high-throughput monitoring of gene expression may 

also be utilized to measure Expression Response. This 'chip -based approach involves 
25 microarrays of nucleic acid molecules as gene-specific hybridization targets to quantitatively 

measure expression of the corresponding mRNA. Hybridization to a microarray can be used 

to efficiently analyze the presence and/or amount of a number of nucleotide sequences 

simultaneously. 

Several microarray methods have been described. One method compares the 
30 sequences to be analyzed by hybridization to a set of oligonucleotides representing all 
possible subsequences. A second method hybridizes the sample to an array of 
oligonucleotide or cDNA molecules. An array consisting of oligonucleotides complementary 
to subsequences of a target sequence can be used to determine the identity of a target 
sequence, measure its amount, and detect single nucleotide differences between the target 
35 and a reference sequence. Nucleic acid molecule microarrays may also be screened with 



59 



WO 02/12478 



PCT7US0 1/24335 



protein molecules or fragments thereof to determine nucleic acid molecules that specifically 
bind protein molecules or fragments thereof. 

The microarray approach may be used with polypeptide targets (U.S. Patent Nos. 
5,445,934; 5,143,854; 5,079,600; and 4,923,901). Essentially, polypeptides are synthesized 
5 on a substr ate (microarray) and these polypeptides can be screened with either protein 

molecules or fragments thereof or nucleic acid molecules in order to screen for either protein 
molecules or fragments thereof or nucleic acid molecules that specifically bind the target 
polypeptides. 

In a preferred embodiment of the present invention microarrays may be prepared that 
10 comprise nucleic acid molecules where preferably at least about 10%, preferably at least 

about 25%, more preferably at least about 50% and even more preferably at least about 75%, 
80%, 85%, 90% or 95% of the nucleic acid molecules located on that array are selected from 
the group of nucleic acid molecules that hybridize under low, moderate or high stringency 
conditions to one or more nucleic acid molecules having a nucleic acid sequence selected 

♦ 

15 from the group of SEQ ID NO: 1 through 3, 5 through 47, and complements thereof. 

In another preferred embodiment of the present invention microarrays may be 
prepared that comprise nucleic acid molecules where preferably at least about 10%, 

* 

preferably at least about 25%, more preferably at least about 50% and even more preferably 

at least about 75%, 80%, 85%, 90% or 95% of the nucleic acid molecules located on that 
20 array are selected from the group of nucleic acid molecules having a nucleic acid sequence 

selected from the group of SEQ ID NO: 1 through 3, 5 through 47, complements thereof, and 

fragments of these sequences. 

In a preferred embodiment of the present invention microarrays may be prepared that 

comprise nucleic acid molecules where such nucleic acid molecules encode at least one, 
25 preferably at least two, more preferably at least three, even more preferably at least four, five 

or six proteins or fragments thereof selected from the group consisting of gcpE, ygbB, ygbP, 

ychB, dxs and dxr. 

The present invention includes and provides a method for determining a level or 
pattern of a protein in a plant cell or plant tissue comprising (A) incubating under conditions 

30 permitting nucleic acid hybridization: (i) a marker nucleic acid molecule having a nucleic 
acid sequence that hybridizes to a sequence selected from the group consisting of SEQ ID 
NOs: 1 through 3, 5 through 47, and complements thereof; and (ii) a complementary nucleic 
acid molecule obtained from the plant cell or plant tissue, wherein nucleic acid hybridization 
between the marker nucleic acid molecule and the complementary nucleic acid molecule 

35 permits the detection of an rriRNA for the protein; (B) permitting hybridization between the 
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marker nucleic acid molecule; and (C) detecting the level or pattern of the complementary 
nucleic acid, wherein the detection of the complementary nucleic acid is predictive of the 
level or pattern of the protein in the plant. 

The present invention also includes and provides a method for determining a level or 
5 pattern of a protein in a plant cell or plant tissue comprising (A) assaying the concentration 
of the protein in a first sample obtained from the plant cell or plant tissue; (B) assaying the 
concentration of the protein in a second sample obtained from a reference plant cell or a 
reference plant tissue with a known level or pattern of the protein; and (C) comparing the 
assayed concentration of the protein in the first sample to the assayed concentration of the 
1 0 protein in the second sample. 

L Screening Uses 

The present invention provides methods and agents that can be used to screen for and 
isolate genes associated with the MEP pathway. Because the MEP pathway is an essential 
pathway, disruption of any essential gene in the MEP pathway will result in the death of the 

15 cell or organism. While not being limited to any particular biological process, the present 
invention provides a method and the agents associated with such a method where mutations 
that result in loss of function of a MEP pathway gene do not result in cell or organism death 
by providing a second pathway capable of synthesizing IPP and DMAPP. The present 
invention provides cells and organisms having a second pathway capable of synthesizing IPP 

20 and DMAPP. 

In a preferred aspect, a cell or organism comprising: (a) a first DNA sequence 
encoding an enzyme having catalytic activity of mevalonate kinase; (b) a second DNA 
sequence encoding an enzyme having catalytic activity of 5-phosphomevalonate kinase; (c) a 
third DNA sequence encoding an enzyme having catalytic activity of 5- 

25 diphosphomevalonate-decarboxylase; and (d) a fourth DNA sequence encoding an enzyme 
having catalytic activity of isopentenyl diphosphate isomerase; wherein at least two of said 
first, second, third, or fourth DNA sequences have a foreign DNA sequence. 

In a preferred aspect, the second pathway capable of synthesizing IPP and DMAPP 
has at least one, more preferably at least two, even more preferably at least three or four 

30 enzymes selected from the group consisting of: mevalonate kinase, 5-phosphomevalonate 
kinase, 5-diphosphomevalonate decarboxylase and isopentenyl diphosphate isomerase. In a 
more preferred embodiment, at least two, even more preferably at least three or four of the 
enzymes selected from the group consisting of: mevalonate kinase, 5-phosphomevalonate 
kinase, 5-diphosphomevalonate decarboxylase and isopentenyl diphosphate isomerase are 
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encoded by a foreign DNA sequence. Any foreign DNA encoding such enzymes may be 
utilized such as human 5-phosphomevalonate kinase (Genbank Accession No. H09914). 

Any cell or organism that possesses the MEP pathway may be used in this aspect of 
the invention. By providing a second pathway capable of synthesizing IPP and DMAPP, 
5 such cells can be utilized in methods to examine the function of a gene, determine whether a 
gene is associated with the MEP pathway, and identify a gene associated with the MEP 
pathway. 

The present invention includes and provides a cell comprising: (a) a first DNA 
sequence encoding an enzyme having catalytic activity of mevalonate kinase; (b) a second 

10 DNA sequence encoding an enzyme having catalytic activity of 5-phosphomevalonate 
kinase; (c) a third DNA sequence encoding an enzyme having catalytic activity of 5- 
diphosphomevalonate-decarboxylase and (d) a fourth DNA sequence encoding an enzyme 
having catalytic activity of isopentenyl diphosphate isomerase; wherein at least two of the 
first, second, third or fourth DNA sequence have a foreign DNA sequence. 

1 5 The present invention includes and provides a method for examining the function 

of a gene associated with the MEP pathway, comprising: (a) rendering inoperative the gene 
in a first cell capable of converting mevalonic acid to isopentenyl diphosphate and 
dimethylallyl diphosphate; (b) rendering inoperative the gene in a second cell incapable of 
converting mevalonic acid to isopentenyl diphosphate and dimethylallyl diphosphate; and (c) 

20 determining the viability of the first cell and the second cell. 

The present invention includes and provides a method for determining whether a 
gene is associated with the MEP pathway, comprising: (a) rendering inoperative the gene in a 
first cell capable of converting mevalonic acid to isopentenyl diphosphate and dimethylallyl 
diphosphate; (b) rendering inoperative the gene in a second cell incapable of converting 

25 mevalonic acid to isopentenyl diphosphate and dimethylallyl diphosphate; and (c) 
determining the viability of the first cell and the second cell. 

The present invention includes and provides a method for identifying a gene 
associated with the MEP pathway, comprising: (a) rendering inoperative the gene in a first 
cell capable of converting mevalonic acid to isopentenyl diphosphate and dimethylallyl 

30 diphosphate; (b) rendering inoperative the gene in a second cell incapable of converting 
mevalonic acid to isopentenyl diphosphate and dimethylallyl diphosphate; and (c) 
determining the viability of the first cell and the second cell. 

Application of the teachings of the present invention to a specific problem or 
environment is within the capabilities of one having ordinary skill in the art in light of the 

35 teachings contained herein. Examples of the products and processes of the present invention 
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appear in the following examples, which are provided by way of illustration, and are not 
intended to be limiting of the present invention. 

EXAMPLE 1 

ISOLATION AND MUTAGENESIS OF THE CODING SEQUENCES 
5 OF THE MVA + TRANSCRIPTION UNIT 

Yeast Diphosphomevalonate Decarboxylase fyPMD. ORF YNR043w. ERG 19) 

The coding sequence of yPMD is amplified by PCR using genomic DNA using 
Saccharomyces cerevisiae strain FY1679 as template. The reaction mixture of the PCR is 
prepared in a final volume of 25 \i\ containing 1 jug of template, 0.5 juM of primers CINCO 

10 (SEQ ID NO: 51) and SEIS (SEQ ID NO: 52), 100 |uM of each deoxynucleoside triphosphate 
(dNTPs) and Pfu reaction buffer (20 mM of Tris-HCI adjusted to pH 8.8, 2 mM of MgS0 4 , 
10 mM of KCI, 10 mM of (NH 4 ) 2 S0 4J 0.1 % of Triton X-100, 100 fig/ml of BSA). The 
sample is covered with mineral oil, incubated at 96° C for 3 minutes and cooled to 80° C. 
Pfu DNA polymerase (1 unit, Stratagene) is added and the reaction mixture is incubated for 

15 30 cycles consisting of 1 minute at 94° C and 4 minutes 30 sec at 72° C, followed by a final 
step of 10 minutes at 72° C. The PCR product (1 879 bp) is cloned in the Sma I restriction 
site of plasmid pBluescript SK+. 

Nde I and Eco RI restriction sites are introduced, respectively, at the 5 ' and 3 ' end of 
the yPMD coding sequence by PCR, using plasmid DNA as template. The reaction mixture 

20 of the PCR is prepared in a final volume of 50 \xl containing 200 ng of template, 1 jliM of 

primers MPD-Nde5 ' (SEQ ID NO: 53) and MPD-Eco3' (SEQ ID NO: 54), 100 \xm of dNTs, 
Pfu reaction buffer and 1 .25 units of Pfu DNA polymerase. The sample is denatured for 2 
minutes at 94°C and incubated for 10 cycles consisting of 1 minute at 94° C, 1 minute at 61° 

* 

C and 2 minutes 30 sec at 72° C. The PCR product (1207 bp) is cloned in the Sma I 
25 restriction site of plasmid pBluescript SK+. Sequencing is performed to ensure that no 
additional mutation had been introduced during amplification. 

Human 5-Phosphomevalonate Kinase fhPMK) 

A Hpa I restriction site is introduced at both ends of the coding sequence of the 
human 5-phosphomevalonate kinase by PCR, using the cDNA clone ymO505.rl from Soares 
30 infant brain 1NIB as template. The clone ym0505.rl (LM.A.G.E. 46897; GenBank accession 
number H09914) is obtained from Research Genetics, Inc (Huntsville, Alabama). The 
reaction mixture of the PCR is prepared in a final volume of 50 jul containing 200 ng of 
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template, 1 jllM of primers hPMKl (SEQ ID NO: 55) and hPMK4 (SEQ ID NO: 56), 100 juM 
of dNTPs, Pfu reaction buffer and 1 .25 units of Pfu DNA polymerase. The sample is 
denatured for 2 minutes at 94° C and incubated for 10 cycles consisting of 30 sec at 94°C, 40 
sec at 65° C and 1 minute 45 sec at 72° C. The PCR product (601 bp) is cloned in the Sma I 
5 restriction site of plasmid pBluescript SK+ and sequenced. 

Yeast Mevalonate Kinase (yMVTL QRF YMR208w. ERG1 2) 

The coding sequence of yMVK is amplified by PCR using genomic DNA from 
Saccharomyces cerevisiae strain FY1679 as template. The reaction mixture of the PCR is 
prepared in a final volume of 25 jlxI containing 1 fxg of template, 0.5 |tiM of primers UNO 

10 (SEQ ID NO: 57) and DOS (SEQ ID NO: 58), 100 |llM of dNTPs and Pfu reaction buffer. 
The sample is covered with mineral oil, incubated at 96° C for 3 minutes and cooled to 80° 
C. One unit of Pfu DNA polymerase is added and the reaction mixture is incubated for 30 
cycles consisting of 1 minute at 94° C and 4 minutes 30 sec at 72° C, followed by a final step 
of 10 minutes at 72° C. The PCR product (1744 bp) is cloned in the Sma I restriction site of 

1 5 plasmid pBluescript SK+. 

A Hpa I restriction site is introduced at both ends of the yPMK coding sequence by 
PCR, using plasmid DNA as template. The reaction mixture of the PCR is prepared in a final 
volume of 50 (0.1 containing 200 ng of template, 1 jliM of primers MK-Hpa5 5 (SEQ ID NO: 
59) and MK-Hpa3 ' (SEQ ID NO: 60), 100 \xM of dNTPs, Pfu reaction buffer and 1 .25 units 

20 of Pfu DNA polymerase. The sample is denatured for 2 minutes at 94° C and incubated for 
10 cycles consisting of 45 sec at 94° C, 45 sec at 57° C and 2 minutes 50 sec at 72° C. The 
PCR product (1351 bp) is cloned in the Sma I restriction site of plasmid pBluescript SK+ and 
sequenced. 

Isopentenyl Diphosp h ate Isomerase from Escherichia coli fecIDD 

25 The coding sequence of the isopentenyl diphosphate isomerase from E. coli is 

amplified by PCR, using genomic DNA from strain W31 10 as template. In this PCR, a Xlio I 
restriction site is introduced at both ends of the coding sequence. The reaction mixture of the 
PCR is prepared in a final volume of 50 jal containing 200 ng of template, 0.5 juM of primers 
idi5X (SEQ ID NO: 61) and idi3X (SEQ ID NO: 62), 100 ^iM of dNTPs and Pfu reaction 

30 buffer. The sample is covered with mineral oil, incubated at 96° C for 3 minutes and cooled 
to 80° C. Pfu DNA polymerase (1.5 units) is added and the reaction mixture is incubated for 
5 cycles consisting of 30 sec at 94° C, 40 sec at 55° C and 1 minute 45 sec at 72 9 C and 25 



64 



WO 02/12478 



PCT7US0 1/24335 



cycles consisting of 30 sec at 94° C and 2 minutes 15 sec at 72° C. The PCR product (569 
bp) is cloned in the Sma I restriction site of plasmid pBluescript SK+. 

EXAMPLE 2 

ASSEMBLY OF THE MVA + TRANSCRIPTION UNIT 

5 The transcription unit is assembled in a derivative of the expression vector pBAD- 

GFPuv (Clonetech, Palo Alto, California; GenBank accession number U62637). This is a 
high copy number plasmid that belongs to the pMBl/ColEl incompatibility group. The final 
transcription unit is composed of four ORFs coding for yPMD, hPMK, yMVK and ecIDI. 
The coding sequences are preceded by ribosomal binding sites that consist of a Shine- 

1 0 Dalgarno sequence followed by an AT-rich translation spacer of eight bases (optimal 

distance to the ATG start codon; Makrides, Microbiol Rev. 60:512+ (1996)). The whole 
construct is under control of the P BAD promoter, which can be induced in the presence of L- 
(+)-arabinose and repressed in the presence of D-(+)-glucose and absence of L-(+)-arabinose. 
Lobell and Schleif, Science 250:528-532 (1990); Guzman et al, J. Bacteriol 177:4121-4130 

15 (1995). 

As a preliminary step, the Nde I restriction site located between pBR322orz and the 
araC coding region of pBAD-GFPuv (position 4926-493 1) is eliminated by site-directed 
mutagenesis as described (Kunkel et al 9 Meth. Enzymology 154:367-382 , 1987), using the 
oligonucleotide pBAD-mutl (SEQ ID NO: 63) as mutagenic primer. The mutation is 
20 confirmed by restriction analysis and sequencing. The plasmid obtained is named pAB-MO. 
The GFP coding sequence of pAB-MO is substituted by the yPMD coding sequence. This 
sequence was cloned between Nde I and Eco RI restriction sites, taking advantage of the 
modifications introduced at the ends of the yPMD sequence. The yPMD sequence is the first 
of the transcription unit. 

25 To clone the other coding sequences, a polylinker is first introduced between EcoRl 

and Sal I restriction sites. The polylinker is generated by annealing the oligonucleotides 
pBAD-Linkl (SEQ ID NO: 64) and pBAD-Link2 (SEQ ID NO: 65). It contains the 
restriction sites Pme I and Sna BI, flanked by cohesive ends of Eco RI and Sal I sites. Sites 
Pme I, Sna BI and Sal I are preceded by the Shine-Dalgarno consensus sequence 

30 "TAAGGAGG". The modified inserts coding for hPMK and yMVK are digested with Hpa I 
and blunt ligated, respectively, into Pme I and Sna BI restriction sites. The modified insert 
coding for ecIDI is digested v/ithXlio I and ligated into Sal I restriction site. Insert 
orientation is confirmed after every step by PCR and sequencing. 
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The plasmid containing yPMD, hPMK and yMVK is named pAB-M2. The plasmid 
containing, in addition, ecIDI is named pAB-m3. 

EXAMPLE 3 

STABLE INTEGRATION OF THE MVA + TRANSCRIPTION 
5 UNIT INTO THE E. coli CHROMOSOME 

Transfer of the MVA + transcription unit to 'the chromosome from E. coli is achieved 
with a genetic system based in two elements: the E. coli strain TE2680 (Elliott, J, BacterioL 
174:245-253, 1992) and a pRS550-derived plasmid (Simons et aL, Gene 53:85-96, 1987). 
Strain TE2680 is a recD (tet 5 ) mutant host that allows efficient recombination of a linear 

10 (restriction enzyme-cleaved) DNA with homologous sequences present in the chromosome. 
The new sequence is incorporated as a single copy and is perpetuated through cell division. 

The sequence of interest, the MVA + transcription unit in this case, can be cloned in 
pRS550 vector, between a functional kanamycin resistance (Kan R ) gene and a promoterless 
version of the lac operon. A similar cassette is present in the recipient host (strain TE2680), 

15 interrupting the trp operon. This strain is auxotrophic for tryptophan. In this case, however, 
a non-functional kanamycin resistance (Kan s ) gene and the deleted version of the lac operon 
are flanking a functional chloramphenicol resistance (Cam R ) gene. A double crossover 
affecting the Kan gene and the deleted version of the lac operon substitutes the sequence of 
interest for the Cam R gene in the chromosome. As a consequence of the crossover, the 

20 recipient strain, originally Kan s and Cam R , becomes Kan R and Cam s . 

The MVA* transcription unit is amplified by PCR using the pAB-M3 plasmid as 
template and oligonucleotides pBAD-D2 (SEQ ID NO: 66) and pBAD-U3 (SEQ ID NO: 67) 
as primers. The reaction mixture of the PCR is prepared in a final volume of 50 jul 
containing 200 ng of template, 1 uM of primers, 200 uM of dNTPs, Pfu reaction buffer and 

25 1.75 units of Pfu DNA polymerase. The sample is denatured for 2 minutes at 94° C and 

incubated for 10 cycles consisting of 40 sec at 94° C, 50 sec at 59° C and 8 minutes 15 sec at 
72° C. The amplified sequence (4126 bp) contains the complete promoter, including the 
regulatory sequences that respond to arabinose and glucose, and the four ORFs that allow 
conversion of MVA to IPP and DMAPP, but lacks the transcription termination signals that 

30 are originally present in the expression cassette. 

A polylinker is introduced in the vector pRS550, to allow cloning of the PCR product 
containing the MVA + transcription unit. The polylinker is generated by annealing the 
oligonucleotides pRS-Ll (SEQ ID NO: 68) and pRS-L2 (SEQ ID NO: 69). It contains the 
restriction sites Pine I, Sma l/Srfl and Not I, flanked by cohesive ends of Bam HI and Eco RI 
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sites. Plasmid pRS21 10 is generated by cloning the polylinker between Bam HI and Eco RI 
restriction sites of vector pRS550. The MVA + transcription is cloned in the Pme I restriction 
site of vector pRS21 10, with the same orientation than the promoterless lac operon, thus 
restoring transcription of the lac operon. The plasmid obtained is named pRS-MVA\ 
5 Plasmid pRS-MVA + are digested with Sal I and Sea I restriction enzymes. This 

digestion rendered a 3196 bp fragment containing the ampicillin resistance gene and a 13406 
bp fragment containing the Kan gene, the MVA + transcription unit and the deleted version of 
the lac operon. Strain EcAB3-l is obtained by transformation of strain TE2680 with the 
linear plasmid DNA. The presence of the MVA 4 " transcription unit in the chromosome of this 

1 0 strain is confirmed by PCR. The activity of this transcription unit is confirmed by the 
appearance of blue colonies in plates containing 5-bromo-4-chloro-3-indolyl (3-D- 
galactopyranoside (Xgal). Strain EcAB3-l is resistant to kanamycin (25 jug/ml) and 
tetracycline (6 j-ig/ml) and sensitive to chloramphenicol (17 |ug/ml) and ampicillin (50 
p.g/ml). The MVA + transcription unit is transduced to E. coli strain MG1655 using phage PI. 

1 5 The strain obtained is named Ec AB4- 1 . 

EXAMPLE 4 

IDENTIFICATION AND FEATURES OF THE gcpE GENE FROM 
E. coli AND A PUTATIVE HOMOLOG FROM Arabidopsis thaliana 

To identify genes potentially involved in the MEP pathway, a bioinformatic 
20 approach is adopted. Because bacterial genes with related functions are often organized in 
operons, uncharacterized open reading frames (ORFs) that are beside known genes of the 
MEP pathway are examined. An ORF of 1 195 bp with unknown function is found just 
upstream of a DXS coding sequence of Streptomyces coelicolor (cosmid 6A5, Accession 
Number AL049485). This ORF is homologous to an essential gene of Escherichia coli 
25 named gcpE (Baker et al. , FEMS Microbiol. Lett 94: 1 75- 1 80, 1 992 (accession number 

X64451)). An homolog of this gene, named aarC, is identified in Providencia stuartii and 
described as an essential gene involved in density-dependent regulation of the 2-7V- 
acetyltransferase (Rather et al, J. Bacteriol 179:2267-2273, 1997). However, no precise 

function was assigned to the aarC gene. 

30 The gcpE gene is broadly distributed in evolution. The occurrence of this gene in 

completely sequenced genomes strictly correlates with the occurrence of the gene encoding 
1-deoxy-D-xylulose 5 -phosphate reductoisomerase (dxr), which catalyses the first committed 
step of the MEP pathway. Fourteen out of 26 sequenced genomes contain both dxr and gcpE. 
Twelve of these sequenced genomes do not contain dxr nor gcpE. Tlie gcpE gene is also 
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highly conserved in plants. GcpE homologs are found as EST entries in Arabidopsis thaliana 
(gb T46582, SEQ ID NO: 5), Glycine max (gb AW1 52929, SEQ ID NO: 6), Lycopersicon 
esculentum (gb AW040413, SEQ ID NO: 7), Mesemhryanthemum crystallinum (gb 
AI822799, SEQ ID NO: 8), Oryza sativa (gb AA753160, SEQ ID NO: 9), Zea mays (gb 
5 AW126434, SEQ ID NO: 10), Pinus taeda (gb AW042702, SEQ ID NO: 1 1) and 
Physcomitrella patens (gb AW497432, SEQ ID NO: 12). 

A cDNA clone from Arabidopsis coding for a gcpE homolog (EST clone 135H1T7, 
accession number T46582) is obtained from the Arabidopsis Biological Resource Center 
(ABRC). This clone encodes a full length protein. The cDNA contains an ORE of 2223 bp 
10 that encodes a protein of 740 amino acid residues (SEQ ID NO: 1). The Arabidopsis gcpE 
gene corresponding to this cDNA is located in chromosome V (genomic PI clone MUP24, 
accession number AB005246). This gene contains 20 exons that extend along 4 kb of 
genomic sequence. 

Alignment of the E, coli and Arabidopsis gcpE proteins shows high similarity but 
1 5 also striking differences. The first 75 amino acid residues of the Arabidopsis sequence 
constitute a region that is not present in the bacterial counterpart. A transit peptide for 
plastids is predicted at this region with the ChloroP VI .0 program accessible at the web site 
www.cbs.dtu.dk/services/ChloroP/ (Score 0.53295). According to this program, the 
processing site of the transit peptide would be located between Arg38 and Ser39 (CS-score 
20 2.392). In vivo import experiments to chloroplasts demonstrated that the N-terminal region 
of the Arabidopsis protein is a functional transit peptide for plastids. 

i 

The putative mature gcpE protein from Arabidopsis is significantly larger than the E. 
coli counterpart (78 versus 41 kDa). Although the two proteins align and show high 
similarity at the N- and C-terminal regions, the Arabidopsis isoform possesses several 
25 additional amino acid sequences between these two regions, particularly a domain of 268 

amino acid residues (30 kDa) which is only present in the Arabidopsis protein (SEQ ID NO: 

1). 

EXAMPLE 5 

DELETION OF THE gcpE CODING SEQUENCE IN THE E. coli GENOME 

30 To confirm whether gcpE from E. coli is indeed involved in the MEP pathway, gcpE 

is deleted in strain EcAB3-l. As mentioned above, mutants of the MEP pathway can be 
rescued in this strain, in the presence of MVA. Deletion of the gcpE gene is accomplished 
by homologous recombination using construct GC5CAT3 as the donor cassette. In this 
construct, the C4rgene is surrounded by the gcpE flanking regions. Substitution of the CAT 
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gene for the gcpE coding sequence in the genome can be selected by chloramphenicol 
resistance. 

Four PCR reactions are necessary to prepare the GC5CAT3 construct. First, a 
genomic region of 323 1 bp, encompassing the gcpE ORF (1116 bp), together with flanking 
5 regions, is amplified by PCR, using genomic DNA from strain MC4100 as template. The 
reaction mixture of the PCR is prepared in a final volume of 50 [d containing 250 ng of 
template, 0.4 fiM of primers 1PE (SEQ ID NO: 70) and 4PE (SEQ ID NO: 73), 200 jiM of 
dNTPs, 1 mM of MgS0 4 , Pfx reaction buffer and 1.25 units of PLATINUM Pfx DNA 
polymerase (Life Technologies Inc., Rockville, Maryland). The sample is denatured for 2 
10 minutes at 94 C and incubated for 30 cycles consisting of 40 seconds at 94 C, 50 seconds at 

67 C and 3 minutes 30 seconds at 68 C. 

The regions flanking the gcpE coding sequence are amplified by PCR using the PCR 
product of primers 1PE and 4PE as template. Primers 1PE (SEQ ID NO: 70) and 22PE (SEQ 
ID NO: 71) are used to amplify the 5 5 flanking region. In this PCR, primer 22PE generates a 

1 5 Sma I restriction site. Primers 3PE (SEQ ID NO: 72) and 4PE (SEQ ID NO: 73) are used to 
amplify the 3' flanking region. In this PCR, primer 3 PE generates a Pme I restriction site. 
The reaction mixtures of these PCRs are prepared in final volumes of 50 jjlI containing 150 
ng of template, 4 fiM of primers, 200 jrM of dNTPs, Pfx reaction buffer and 1.25 units of 
PLATINUM Pfx DNA polymerase. The samples are denatured for 2 minutes at 94 C and 

20 incubated for 10 cycles consisting of 40 seconds at 94 C and 2 minutes at 68 C. The PCR 
product corresponding to the 3 5 flanking region (1061 bp) is cloned in the Sma I restriction 
site of plasmid pBluescript SK-h The plasmid obtained is named GC3. Subsequently, the 
PCR product corresponding to the 5' flanking region (1 102 bp) is cloned in the Pme I 
restriction site of plasmid GC3. The relative orientation of the 3' and 5 5 flanking regions is 

25 the same than that in the E. coli genome. The plasmid with the two gcpE flanking regions is 
named GC53. 

The CAT gene is amplified by PCR using the plasmid pCAT19 (Fuqua, 1992) as 
template and oligonucleotide CAT1 (SEQ ID NO: 74) and CAT4 (SEQ ID NO: 75) as 
primers. The reaction mixture of the PCR is prepared in a final volume of 50 pi containing 
30 100 ng of template, IjjiM of primers, 100 [xM of dNTPs, Pfx reaction buffer and 1.25 units of 
PLATINUM Pfx DNA polymerase. The sample is denatured for 2 minutes at 94 C and 
incubated for 20 cycles consisting of 40 seconds at 94 C, 50 seconds at 53 C and 1 minute at 

68 C. The PCR product (960 bp) is cloned in the Sma I restriction site of plasmid GC53. 
The construct obtained is named GC5CAT3. In this construct, the C^Tgene has the same 

35 orientation than the gcpE gene previously deleted. 
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Plasmid containing GC5CAT3 construct is digested with Hindlll, Xba 1 and Xho i 
restriction enzymes to release the recombination cassette. This cassette is amplified by PCR 
using oligonucleotides 1PE (SEQ ID NO: 70) and 4PE (SEQ ID NO: 73) as primers. The 
PCR product is used to transform electrocompetent cells of strain EcAB3-l. These cells are 
5 plated on 2xTY medium containing 1.5 % agar (w/V), 17 fig/fcal chloramphenicol, 6|igyfcnl 
tetracycline, 25 [Jigyhil kanamycin, 0.2 % (w/r) L-(+)-arabinose and 1 mM MVA. 

The presence of the CAT gene in place of the gcpE coding sequence in the genome of 
transformants is confirmed by PCR using oligonucleotides OPE and 5PE as primers. The 
identity of the PCR product is verified by restriction analysis. Oligonucleotides OPE (SEQ 
10 ID NO: 76) and 5PE (SEQ ID NO: 77) are complementary to genomic sequences located 
outside of the region included in the recombination construct. Analysis of transformants 
confirms both the absence of the original gcpE gene and the presence of the CA Tgene. The 
novel strain is named EcAB3-3. 

Strain EcAB3-3 can grow only in the presence of MVA. A control strain carrying a 
15 disruption of dxs gene (EcAB3-2) is also auxotrophic for MVA. 

EXAMPLE 6 
IDENTIFICATION OF GCPE FUNCTION 

Example 5 describes the generation of E. coli strain with a deletion of the gcpE 
coding sequence (strain EcAB3-3). In addition to the gcpE deletion the strain also carries a 
MVA + transcription unit as described in Examples 1, 2 and 3 which makes it auxotrophic for 
mevalonic acid or mevalonate (MVA). This strain is used to find out which intermediate 
accumulates due to the disruption of the gcpE gene. The gcpE deletion disrupts the MEP 
pathway blocking the formation of IPP and DMAPP, creating the need for exogenous MVA 
to synthesize IPP and DMAPP. 

A culture of the E. coli strain with a disrupted gcpE gene is made in the presence of 
MVA. After growth, the cells are harvested by centrifugation, washed with culture medium 
containing no MVA and resuspended for 1 6 hours in a culture medium containing [ 3 H]ME 
(Methylerythritol). Thin layer chromatography separation of the water/ethanol (30:70) 
extract of the cells affords a radioactive band co-eluting with methylerythritol 
cyclodiphosphate (isopropanol/water/ethyl acetate, 60:30:10, R f = 0.56). Carrier material is 
obtained for the latter compound from Corynebacterium ammoniagenes treated with 
benzylviologen. Additional data is collected, suggesting that the radioactive compound 
might correspond to methylerythritol cyclodiphosphate. On HF hydrolysis, it releases free 
methylerythritol. Like methylerythritol cyclodiphosphate, it is not affected by alkaline 
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phosphatase, which normally cleaves acyclic diphosphates. This compound is not 
accumulated by the mva+/dxr- E. coli strain with an intact gcpE gene. In the latter 
experiment [ 3 H]ME is incorporated into ubiquinone and menaquinone, which are not labeled 
in the gcpE disrupted strain. 
5 Further conformation of function for gcpE will require cell-free assays using 

radiolabeled methylerythritol cyclodiphosphate as described below. 

EXAMPLE 7 
GCPE ENZYME ASSAYS 

Enzymatic preparation of [ 14 C]methylerythritol 2.4-cyclodiphosphate 

10 The substrate methylerythritol cyclodiphosphate cannot be readily chemically 

synthesized. Attempts to accumulate the tritiated compound from [ 3 H]ME by the mvaVdxr" 
/gcpE" mutant described above result in very low yields. Enzymatic synthesis of 
[ 14 C]methylerythritol cyclodiphosphate is thus required. This can be achieved using all the 
known enzymes of the MEP pathway, viz., dxs, dxr,ygbP,ychB 9 and ygbB. 

15 Enzymatic syntheses of [ 14 C]-deoxy-D-xylulose-5-phosphate (DXP) and MEP from 

[ 14 C]pyruvate isotopomers and D-glyceraldehyde-3 -phosphate (GAP) are performed using E. 
coli strains overexpressing dxs and dxr genes. In order to prepare the subsequent 
[ 14 C]methylerythritol cyclodiphosphate from the [ 14 C]MEP the following scheme is used. 
Three E. coli strains are generated with each one overexpressing one of the three 

20 remaining genes in the MEP pathway, viz., ygbP (pQE3 1 -ygbP, pREP4), ychB (pQE30- 
ychB, pREP4) andygbB (pQE30-ygbB, pREP4). Each strain is grown on LB medium 
containing ampicillin and kanamycin at 37°C overnight. Each culture (2ml) is used to 
inoculate the same medium (50 mL), which are then grown for 3 hours until a 0.5 OD (600 
nm) is reached, then induced using IPTG (final concentration 0.1 mM) for 4.5 hours. After 

25 centrifagation, the cells of each culture are resuspended in 100 mM Tris-HCl (3 mL, pH 8) 
and disrupted by sonication (3 x 30 s with 1 min cooling) at 0 °C. After centrifugation, the 
supernatant is stirred for 1 hour at 0°C in the presence of a 50% Ni-NTA slurry (1 mL, 
Qiagen Inc., Valencia, California). 

The lysate-Ni-NTA mixture is loaded onto a column and the flow-through is 

30 collected. The column is washed twice with 100 mM Tris-HCl (4 mL, pH8) containing 

50mM imidazole. The proteins are eluted with 100 mM Tris-HCl (2 mL, pH 8) containing 
200 mM imidazole. Additional 100 mM Tris-HCl (1.5 mL, pH 8) is added to each protein, 
and the resulting solution is dialyzed against 100 mM Tris-HCl (pH 8) containing 20% 
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glycerol. On a 12% SDS-PAGE gel, the 6xHis-tagged MEP cytidylyl transferase (ygbP\ 
CDP-ME kinase (ychB) and 2-C-methyl-D-eiythritol 2,4-cyclodiphosphate synthase (ygbB) 
are separated from other cellular components. 

Using these pure proteins, [ 14 C]2-C-methyl-D-erythritol 2,4-cyclodiphosphate is 
5 prepared in a one-pot procedure. In a typical incubation, [ 14 C]MEP (10 L, 2.27x1 0 6 cpm, 
15.8 Ci/ mol) is incubated with the purified MEP cytidylyl transferase (100 L, 0.4 mg/rnL), 
6xHis-tagged CDP-ME kinase (200 L, 0. 15 mg/mL) and 2-C-methyl-D-erythritol 2,4- 
cyclodiphosphate synthase (200 L, 0.6 mg/mL) solutions in 100 mM Tris-HCl (1 mL, pH 8) 
containing 5 mM CTP, 1 mM ATP, 5 mM MnCl 2 and 5 mM MgCl 2 . The incubation is 

1 0 performed at 37°C for 1 0 hours. 

An aliquot (3 L) is analyzed on a silica gel plate eluted with isopropanol/water/ethyl 
acetate (6:3:1). Radioactivity is monitored with a Phospholmager. A single radioactive 
compound is detected. It coelutes with unlabeled 2-C-methyl-D-erythritol 2,4- 
cyclodiphosphate. No radioactivity is found comigrating with ME-CDP. An aliquot is 

15 incubated in the presence of alkaline phosphatase and no [ 14 C]methyerythritol is detected, 
indicating that no [ 14 C]MEP remained in the incubation mixture. 

GCPE Enzyme Test 

When purified His-tagged GCPE is assayed with the [ I4 C] 2-C-methyl-D-erythritol 
2,4-cyclodiphosphate as prepared above there is no reaction product detected. One reason 

20 for lack of activity could be that GCPE needs other proteins to form a complex with diverting 
2-C-methyl-D-erythritol 2,4-cyclodiphosphate into the two branches of the MEP pathway. 
Because of the genetic link of yfgB and yfgA with gcpE (all three are on the same operon of 
the E. coli genome), it is possible that these proteins could be part of this hypothetical 
enzyme complex. Thus, an expression plasmid containing the genomic region covering yfgB, 

25 yfgA and gcpE is constructed and stably transformed into E. coli creating the strain 

BL2 1 (DE3)pLy s [PET-T7-gcpE-yfg A-yfgB] . This strain and the BL21(DE3)pLys[PET-T7] 
and BL21(DE3)pLys[PET-T7-yfgA-yfgB] or [MVA + ,gcpETQE30-AT-gcpE] strains are 

grown and induced with IPTG using standard conditions. 

i 

In a typical experiment, the E. coli strain BL21(DE3)pLys[PET-T7-gcpE-yfgA- 
30 yfgB] is grown at 30°C in LB medium (50 mL) containing chloramphenicol (34 g/mL) and 
ampicillin (100 mg/mL) until reaching a 0.65 OD (600 nm). Induction is then performed 
with IPTG (0.5 mM) for 6 hours. The cells are harvested by centrifugation (7000g, 10 min) 
resuspended in buffer (4 mL, 50 mM Tris Hcl pH = 8, 1 mM PMSF, 1 mM DTT, 5 mM 
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MgCl 2 ) and broken at 0°C by sonication (2 x 30 s, with 1 min cooling). The cell debris is 
removed by centrifugation (16000 g, 10 min). 

The resulting crude cell-free material (130 L) is completed with buffer (20 L) and 
used for the enzyme assays at 37°C for 7 hours and 20 hours with the [ 14 C]2-C-methyl-D- 
5 erythritol 2,4-cyclodiphosphate solution (50 L) obtained as described above. Controls 
consist in the same mixture, but the enzyme preparation is replaced by buffer. After 
incubation, an aliquot (9 1) of each assay is analyzed on a silica plate eluted with 
isopropanol/water/ethyl acetate (6:3: 1). Radioactivity is monitored with a Phospholmager. 
For unknown reasons, only the assay with E. coli BL21(DE3)pLys[PET-T7-gcpE- 

1 0 yfgA-yfgB] extract is successful. In all assays performed with enzyme preparations from 

other strains, the entire radioactivity comigrated with unlabeled 2-C-methyl-D-erythritol 2,4- 
cyclodiphosphate, indicating that no reaction occurred. The TLC migration profile is the 
same as that observed for the control without enzyme. 

In the case of all assays performed with the cell system prepared from the 

15 BL21(DE3)pLys[PET-T7-gcpE-yfgA-yfgB] strain, there is decrease of the substrate 

concentration and the accumulation of a new compound. According to its TLC behavior (R f 
= 0.85, isopropanol/water/ethyl acetate, 60:30:10), this compound corresponds to anon- 
phosphorylated derivative. Such a dephosphorylation is most likely, as the test is perfomed 
with a crude cell-free system containing probably phosphatases, and as no phosphatase 

20 inhibitor was added to the incubation buffer. Dephosphorylation of the reaction product 
might favor displacement of the reaction, the full consumption of the substrate and finally 
accumulation of a single major product. 

The same compound is obtained when only MgCl 2 was present in the assay, 
suggesting that the cofactors tested are not necessary. It is possible that the fact the product 

25 is dephosphorylated in situ helped to its accumulation. The dephosphorylated new 
compound (R f = 0.56, CHCl 3 /CH 3 OH, 8:2) is characterized by a R f between those of 
methylerythritol (R f = 0.22) and isopentenol (R f = 0.56). TLC comparison with unlabeled 
synthetic carriers indicates that compounds 1 to 9 (shown in Figure 1) do not correspond to 
the non-phosphorylated new compound. 

30 To fully characterize the dephosphorylated product, a larger-scale incubation (10X) 

is performed and the residue is acetylated (pyridine/Ac 2 0, 10 ml) overnight. After the 
removal of the reagents, the residue is resuspended in CHC1 3 (12 ml) and the resulting 
precipitate is removed by filtration. The filtrate is concentrated to dryness (836000 cpm, 
l.lg) and purified on a silica column (8g) eluted with hexane/ethyl acetate (3:1) and fractions 

35 of 5 ml are collected. An aliquot (4 1) of each fraction is spotted on TLC plates (hexane/ethyl 
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acetate, 3:1) and the radioactivity monitored by Phospholmager. The radioactive fractions of 
same R f are pooled together. 

Three radioactive products can he detected: Fraction A (200 mg) contains the acetate 
of the dephosphorylated new compound (R t = 0.4), fraction B (20 mg) contains the 2-C- 
5 methyl-D-erythritol triacetate (R^= 0.2), and fraction C (100 mg) contains another new 
compound (Rf= 0.25) which is not yet identified. Fraction A is further purified on a silica 
column (9g) eluted first with CH 2 C1 2 in order to remove almost all impurities and then with 
ethyl acetate in order to recover the radioactive product. As previously described, an aliquot 
(4 1) of each 2 ml fraction is checked for radioactivity and the radioactive fractions are 

1 0 pooled together, concentrated to dryness and almost pure acetate of the dephosphorylated 
new compound (1 mg) is obtained. 

This compound is analyzed by 'H-NMR and from the resulting spectrum it is 
concluded that the acetate of the putative dephosphorylated GCPE product could be diacetate 
of(£>2-methylbut-2 -ene-l,4-diol. The spectrum is compared with a reference synthetic 

1 5 diacetate of (£)-2-methylbut-2-ene- 1 ,4-diol synthesized by LiAlH 4 reduction of 

methylfumaric acid as previously described for the reduction of 3-methylfuran-2(5H)-one or 
citraconic anhydride (Duvold et al 9 Tetrahedron Letters 38: 6181-6184, l f 997). All signals 
of the enzymatic product match the corresponding signals in the synthetic standard. 
Furthermore the coelution of the enzymatic radioactive product and the synthetic diacetate of 

20 (£)-2-methylbut-2-ene-l,4-diol is observed (CH 2 C1 2} Rf= 0.25). Therefore, one product of 
the incubation is identified as diacetate of (£)-2-methylbut-2-ene-l ,4-diol (Figure 2). This 
positive identification suggests that the product of GCPE reaction with 2-C-methyl-D- 
erythritol 2,4-cyclodiphosphate is (£)-l-(4-hydroxy-3-methylbut-2-enyl) diphosphate (Figure 
3). 



25 EXAMPLE 8 

CHARACTERIZATION OF ARABIDOPSIS GCPE 

Upon identification of the Escherichia coli gcpE gene as involved in the trunk line of 
the MEP pathway for isoprenoid biosynthesis, the available databases are searched for plant 
homologs. As described in Example 4, clone 135H1 (Genbank accession number T46582) is 
30 identified as containing an Arabidopsis thaliana cDNA encoding a protein with homology to 
the product of the bacterial gcpE gene. As shown in Figure 4, however, the putative 
Arabidopsis GCPE protein (SEQ ID NO: 79), contains several domains that are absent from 
the E. coli protein (SEQ ID NO: 78). Identical residues are in black boxes and conservative 
changes in grey boxes. Gaps are indicated with dots. The predicted cleavage site for the 
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plastidial targeting peptide (according to the ChloroP program; 
genome.cbs.dtu.dk/services/chlorop) is indicated with an arrow {see Figure 4). 

To determine whether the Arabidopsis protein encoded by clone 135H1 is indeed a 
GCPE protein, a complementation assay is carried out using the E. coli strain EcAB3-3. In 
5 this strain, which is engineered to synthesize IPP and DMAPP from mevalonic acid (MVA), 
the chromosomal gcpE gene is disrupted by insertion of the CAT marker conferring 
chloramphenicol resistance. Because the disruption of gcpE is lethal, mutant EcAB3-3 cells 
require MVA for growth {see Example 5). 

For the complementation assay, plasmid pQE-AGH is created by subcloning a BglLI- 

10 Sphl fragment (coding sequence SEQ ID NO: 80 and deduced amino acid sequence SEQ ID 
NO: 81) from clone 135H1 into the BamHI-SphI sites of the pQE30 expression vector 
(coding sequence SEQ ID NO: 82 and deduced amino acid sequence SEQ ID NO: 83) 
(Qiagen) (Figure 5). The resulting construct encodes a His-tagged protein (coding sequence 
SEQ ID NO: 84 and deduced amino acid sequence SEQ ID NO: 85) lacking the N-terminal 

15 sequence predicted to be a plastidial targeting peptide with the ChloroP program (Figure 5). 
Expression from plasmid pQE-AGH is under the control of the IPTG-inducible T5 promoter. 
Figure 5 depicts the coding sequences in uppercase, and the deduced amino acid sequences 
are shown below the respective coding sequences. The predicted cleavage site for the 
plastidial targeting peptide is indicated with an arrow. 

20 EcAB3-3 cells are transformed with plasmid pQE-AGH and plated on LB plates 

containing 100 mg/1 kanamycin (to select for the MVA operon), 34 mg/1 chloramphenicol (to 
select for the gcpE gene disruption), 100 mg/1 ampicillin (to select for transformants 
containing pQE-AGH), 0.04 % arabinose (to induce expression of the MVA operon genes), 
and 0.5 mM MVA (to be used for IPP and DMAPP biosynthesis). The resulting strain, 

25 EcAB3-3(pQE-AGH), is able to grow in absence of MVA at 30°C and 37°C, confirming that 
MVA auxotrophy can be overcome by the presence of plasmid pQE-AGH. These results 
demonstrate that the cloned Arabidopsis cDNA encodes a protein with the same activity as 
the E. coli GCPE protein. 

In order to study whether the truncated Arabidopsis GCPE protein cloned in plasmid 

30 pQE-AGH is active in converting ME-cPP to the next intermediate of the MEP pathway, the 
protein is expressed at high levels in E. colu Strains XLlBlue or M15 (Qiagen Inc., 
Valencia, California) are used for expression under several experimental conditions: growth 
at 23°C, 30°C, or 37°C and induction with 1 or 0.4 mM IPTG, with unsuccessful results. 
When strain EcAB3-3(pQE-AGH) is used, however, expression of the cloned protein is 

35 detected. 
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An overnight culture of EcAB3-3(pQE-AGH) cells grown in LB medium 
supplemented with kanamycin, chloramphenicol, ampicillin, arabinose and with or without 
MVA at the concentrations described above is diluted 1 :50 in fresh medium and incubated at 
37°C until reaching an OD 600 of ca. 0.3 . Although cells grew better when MVA is added to 
5 the medium, the presence of plasmid pQE-AGH is sufficient to allow growth in the absence 
of any exogenous source for isoprenoid synthesis. Expression of the truncated Arabidopsis 
GCPE protein is induced by adding IPTG to a final concentration of 0.4 mM. 

After incubation at 30°C for 4 hours, cells are collected by centrifugation and 
resuspended in a 1/50 volume of homogeneization buffer (Tris-HCl 20 mM pH 8.0, 1 mM - 

10 mercatoethanol, lmg/ml lysozime, 80 mg/1 PMSF, and 1 tablet/20 ml of Complete Mini, 

EDTA-free Protease Inhibitor Cocktail Tablets (Roche Molecular Biosystems, Indianapolis, 
Indiana)). Following incubation at room temperature for 20 minutes, cells are sonicated 5 
times for 30 seconds at 30W. The insoluble fraction is pelleted by centrifugation at 5000xg 
for 30 minutes and the supernatant (soluble fraction) is collected. Electrophoresis on SDS- 

1 5 PAGE of an aliquot of this soluble fraction shows that a protein of the expected size (ca. 78 
kD) is expressed in cells grown with or without MVA. 

Purification of the His-tagged protein from the soluble extract is carried out using 
HiTrap columns (Pharmacia, Uppsala, Sweden). Flux through the column is kept constant at 
2.5 ml/min during all the steps. After applying the sample to a column and washing unbound 

20 proteins with 20 ml of washing buffer (20 mM Tris-HCl pH 8.0, 10 mM imidazole, 500 mM 
NaCl), elution is performed with 50 ml of a gradient solution containing from 10 mM to 500 
mM imidazole and 2.5 ml fractions are collected afterwards. The truncated Arabidopsis 
GCPE protein elutes at 100 mM imidazole and is virtually pure. 

EXAMPLE 9 

25 PREPARATION OF PLANT EXPRESSION VECTORS WITH GCPE 

Rice, soybean and E. coli gcpE genes are chosen for plant expression. An E. coli 
gene (SEQ ID NO: 3) is cleaved by Ncol I EcoRl restriction digest, gel purified, and ligated 
into Ncol I ifcoRI-digested and gel purified pMON26541 resulting in the formation of a 
shuttle vector. These ligations fuse the bacterial gcpE gene to CTP1, which is the chloroplast 
30 target peptide of the small subunit of the ribulose bisphosphate carboxylase from 
Arabidopsis, and place it under e35S promoter control. 

To place the gcpE gene under napin promoter control, the shuttle vector is digested 
with EcoRl, ends are filled in using the Klenow fragment, and the gel purified vector is 
digested with Bgl II. The smaller fragment encoding the gcpE gene fused to CTP1 is gel 
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purified. pCGN3224 is digested with Pstl, ends are filled in with Klenow fragment and 
subsequently the vector is digested with Bgl II and gel purified. The purified vector and the 
purified CTPl::gcpE fusion are then ligated into digested and gel purified pGCN3223. 

To transfer the E. coli gcpE gene into an Arabidopsis binary vector, pGCN3223 is 
5 digested with Hindlll and Sac I and the gel purified fragment carrying the e35S promoter 

fused to CTP1 and gcpE is ligated into Hindlll I Sad-digested and gel purified pMON26543, 
resulting in a vector containing gcpE under e35S promoter control. The pNapin binary 
expression vector is obtained by ligating the gel purified Noil fragment harboring the 
pNapin::CTPl::gc/?i?::napin 3 5 expression cassette into Notl digested pMON36176. 

1 0 Seed-specific expression vectors for a rice gcpE (SEQ ID NO: 2) and a soybean gcpE 

(SEQ ID NO: 6) sequence are constructed using a pBin!9 (Bevan, Nucleic Acids Research 
12: 871 1-8720, 1984) derivative. The plasmid contains the Viciafabei seed-specific promoter 
from the Legumin B4 gene (Baumlein et al, Nucleic Acids Research 14: 2707-2719, 1996), 
the sequence encoding the transit peptide of the Nicotiana tabacum transketolase (TkTp) (R. 

15 Badur, Ph.D. thesis, Georg August University of Gottingen, Germany, 1998) and the 

transcriptional termination sequence from the octopin synthase gene (Gielen et al, EMBO J. 
3:835-846, 1984). A rice gcpE (SEQ ID NO: 2) sequence is cloned in sense orientation as a 
Bam HI fragment into the Bam HI site of the pBin-LePTkTp9 vector, resulting in a 
recombinant rice gcpE expression vector. A recombinant soybean gcpE (SEQ ID NO: 6) 

20 expression vector is similarly created. 

EXAMPLE 10 
TRANSFORMATION OF PLANTS 

Agrobacterium transformed with the vectors of Example 9, and with pQE-AGH 
(which contains the Arabidopsis gcpE gene), are prepared as follows. lOOjal of an overnight 

25 culture is spread on an agar LB plate with antibiotics. The plate is placed upside down in a 
30°C chamber overnight. The plates are removed after colonies have grown (24-48 hours). 
A small scale culture is started by placing 10 ml of liquid LB media in a 50 ml tube. lOjal 
Kanamycin (50 jag/jLiL), IOjlxI Spectinomycin (75-100 jug/juL), and IOjllI Chloramphenicol (25 
jig/pL) are added. Agrobacterium is added from a plate, and the tube is shaken and placed in 

30 a 30°C shaker overnight. 

Following overnight growth of the 10 ml culture, the culture is removed to a 500 ml 
flask. 200 ml of liquid LB is placed in a flask, 200^1 Kanamycin (50 \ig/\xL), 200|ul 
Spectinomycin (75-100 jLLg/jaL), and 200|al of Chloramphenicol (25 \xg/\xL) are added, and 
the entire 10ml overnight culture is then added. The 500 ml flask is placed in a 30°C shaker 
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and grown overnight. The entire 200 ml culture is placed in a centrifuge tube and 
centrifuged for 25 minutes at 3,750 rpm and 19°C. After centrifugation, the liquid is poured 
off and the pellet is resuspended in 25 ml of 5% Sucrose (0.05% Silwet) solution. 

900jtil of the sucrose solution and 100]ul of the 25 ml bacterial culture are placed in a 
5 cuvette, and the cuvette is shaken with a covering of parafilm. A blank OD reading is taken 
with 1 ml of sucrose solution, and then readings of all the bacterial solutions are taken. The 
OD (at a wavelength of 600) of each culture is recorded. The following calculations are then 
performed: C I V 1 = C 2 V 2 ; W = (0.8)(200ml); C,V, = 160; V! = 160 / Q; and V l = X ml/10 
to determine OD 600 = 0.8 of an Agrobacterium culture. 

10 Plants are soaked for at least 30 minutes in water prior to dipping. The bacterial 

solution is poured into a shallow plastic container, and above ground parts of the plant (bolts, 
rosettes) are dipped into the solution for 3-5 seconds with gentle agitation. Dipped plants are 
placed on their side in a diaper lined black tray, and covered by a dome overnight (16-24 
hours) to maintain a high humidity. The cover is removed and normal plant growth 

1 5 conditions are resumed for 4 weeks. 

Following the transformation and high humidity treatment, plants are maintained at 
22°C, 60% RH, and a 16 hour photoperiod for 4 weeks. 5-7 days after transformation, plants 

■ 

are coned. Fertilization with a weak 20-20-20 fertilizer is done weekly. After 4 weeks of 
growth, plants are placed in the greenhouse and all watering is stopped to encourage plant 

20 dry down for seed harvest. Plants are ready for seed harvest after 1-1 .5 weeks of dry down. 
Seeds are harvested by cutting the base of the plant below the cones, holding the plant over a 
seed sieve and a white piece of paper, running bolts through the cone hole, and collecting 
clean seeds through sieving. 

Seeds are sterilized by connecting a vacuum desiccator hose to a vacuum in a fume 

25 hood/flow bench. 100 ml of bleach is placed in a 250 ml beaker, and 3 ml of concentrated 
HC1 is added to the bleach. The beaker is placed in the desiccator, and seeds in seed tubes in 
a tube holder are placed in the desiccator. A cover is placed on the desiccator, and the 
vacuum is operated. The desiccator is left overnight but no longer than 16 hours. 

Once sterilized, seeds are plated on selection media (prepared by adding lOg (2g/L) 

30 Phyta-Gel, 10.75 g (2.15 g/L) MS Basal Salts (M-5524 from Sigma), 50 g (lOg/L) sucrose, 
and 6 ml (1 .2 ml/L) Kanamycin solution (950mg/ml), 5ml (lml/L) Cefotaxime Solution (250 
mg/ml), and 5 ml (1 ml/L) Carbenecillin Solution (250 mg/ml) to a total volume of 5 liters at 
a pH of 5.7). Seed tubes are tapped lightly over a plate in order to distribute the seeds 
sparsely. The plates are wrapped in parafilm and placed in a 4°C refrigerator for 1-2 days of 
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cold treatment. After this cold treatment the plates are placed in a 28°C chamber for 
germination. 

Selected plantlets are green and have secondary leaves developing. The selected 
plantlets are moved to soil after secondary leaves have developed. The plantlets are potted in 
5 soil and covered with a dome for 5 days to maintain high humidity. The plantlets are moved 
to a greenhouse after the bottom siliques begin to turn yellow. 

Seeds from the selected plantlets are grown in 2.5 inch pots with soil ( Vz Metro-200; 
V% PGX Mix). The soil is mounded and the pot is covered with mesh screen. The screen is 
fastened to the pot with a rubber band. Seeds are sown and covered with a germination 
10 dome. The seedlings are grown in a 12 hour photoperiod in 70% relative humidity at 22°C. 
Water is supplied every other day as needed and Peter's 20-20-20 fertilizer is applied from 
below, bi-weekly. 

EXAMPLE 11 

PRODUCTION OF SEEDS FROM TRANSGENIC PLANTS 

15 Transgenic seed plants from Example 10 representing 20 independent transformation 

events are grown and seeds harvested to produce T 2 seeds. The T 2 seeds are grown and tested 
for tocopherol levels. Tocopherol levels are determined by adding 10 to 15 mg of 
Arabidopsis seed into a 2 mL microtube. A mass of 1 g of 0.5mm microbeads (Biospecifics 
Technologies Corp., Lynbrook, NY) and 500 jal 1% pyrogallol (Sigma Chem, St. Louis, MO) 

20 in ethanol containing 5 |u.g/mL tocol, are added to the tube. The sample is shaken twice for 45 
seconds in a FastPrep (Bio 101 /Savant) at a speed of 6.5. The extract is filtered (Gelman 
PTFE acrodisc 0.2 jam, 13 mm syringe filters, Pall Gelman Laboratory Inc, Ann Arbor, MI) 
into an autosampler tube. HPLC is performed on a Zorbax silica HPLC column, 4.6 mm x 
250 mm (5 jam) with a fluorescent detection using a Hewlett Packard HPLC (Agilent 

25 Technologies, Palo Alto CA). Sample excitation is performed at 290 nm, and emission is 

monitored at 336 nm. Tocopherols are separated with a hexane methyl-t-butyl ether gradient 
using an injection volume of 20 jul, a flow rate of 1 .5 ml/min, and a run time of 12 min 
(40°C). Tocopherol concentration and composition is calculated based on standard curves 

for a, 8, and y- tocopherol using Chemstati on software (Agilent Technologies, Palo Alto 
30 CA). 



79 



WO 02/12478 PCT/USO 1/24335 

EXAMPLE 12 

TRANSGENIC PLANTS WITH GCPE AND OTHER TOCOPHEROL BIOSYNTHESIS 

GENES 

Canola, Brassica napus and soybean plants are transformed with a variety of DNA 
5 constructs using a particle bombardment approach essentially as set forth in Christou (1 996) 
or using Agrobacterium mediated transformation. Two sets of DNA constructs are produced. 

The first set of constructs are "single gene constructs" in which the gcpE gene is 
inserted into a plant DNA construct under the control of an arcelin 5, 7S alpha or napin 
promoter (Kridl et aL, Seed Set Res. 1:209-219, 1991). The products of the gcpE gene can 
10 be targeted to the plastid by an encoded plastid target peptide such as CTP1 (Keegstra, Cell, 
5(50:247-253, 1989; Nawrath, et aL, PNAS 9 7:127 '60 -127 64, 1994). 

A second set of DNA constructs is generated and referred to as the "multiple gene 
constructs". The multiple gene constructs contain multiple genes each under the control of a 
napin promoter and the products of each of the genes are targeted to the plastid by an 
15 encoded plastid target peptide, such as a natural plastid target peptide present in the trans 
gene, or an encoded plastid target peptide such as CTP1 . 

The multiple gene construct contains the gcpE gene and one or more genes for other 
MEP pathway proteins, including, but not limited to: zygbB gene; aygbP gene; zychB gene; 
zyfgd gene; zyfgB gene; a Afunctional prephenate dehydrogenase such as the E. herbicola 
20 or E. coli tyrA gene (Xia et aL, J. Gen. Microbiol. 138:1309-1316, 1992), a 
phytylprenyltransferase such as the slrl736 gene (in Cyanobase 

www.kazusa.or.jp/cyanobase) or the ATPT2 gene (Smith et ah, Plant J. 11: 53-92, 1997), a 
deoxyxylulose synthase such as the E. coli dxs gene (Lois et aL, PNAS 95(5):2l05-2\ 10, 
1998), a deoxyxylulose reductoisomerase such as the dxr gene (Takahashi et ah PNAS 

25 95(17), 9879-9884, 1998), an Arabidopsis thaliana HPPD gene (Norris et aL, Plant Physiol. 
117:1317-1323, 1998), an Arabidopsis thaliana GGPPS gene (Bartley and Scolnik, Plant 
Physiol 104: 1469-1470, 1994), a transporter such as the AANT1 gene (Saint Guily, et al., 
Plant Physiol 700^:1069-1071, 1992), a GMT gene (WO 00/32757, WO 00/10380), an 
MT1 gene, a tocopherol cyclase such as the sir 1737 gene (in Cyanobase) or its Arabidopsis 

30 ortholog, an isopentenyl diphosphate isomerase (IDI) gene, and an antisense construct for 
homogentisic acid dioxygenase (Sato et aL, J. DNA Res. 7 (1)\3 1-63, 2000). 

Each construct is transformed into at least one canola, Brassica napus and soybean 
plant. Plants expressing each of these genes are selected to participate in additional crosses. 
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The tocopherol composition and level in each plant is also analyzed using the method set 
forth in Example 1 1 . 

The tocopherol composition and level in each plant generated by the crosses 
(including all intermediate crosses) is also analyzed using the method set forth in Example 
5 11. Progeny of the transformants from these constructs will be crossed with each other to 
stack the additional genes to reach the desired level of tocopherol. 

Crosses are carried out for each species to generate transgenic plants having one or 
more of the following combination of introduced genes: gcpE, ygbB, ygbP> ychB; yfgA; yfgB; 
tyrA, sir 1736, A TPT2, dxs, dxr, GGPPS, HPPD, GMT, AANT1, sir 17 37, IDI and an antisense 
1 0 construct for homogentisic acid dioxygenase. 

The above description, sequences, drawings and examples are only illustrative of 
preferred embodiments that achieve the objects, features and advantages of the present 
invention. It is not intended that the present invention be limited to the illustrative 
embodiments. Any modification of the present invention which comes within the spirit and 
15 scope of the following claims should be considered part of the present invention. 
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What is claimed is: 

I . A substantially purified nucleic acid molecule that encodes a protein 
comprising an amino acid sequence selected from the group consisting of SEQ ID NOs: 4 

5 and 48 through 50. 

2 The substantially purified nucleic acid molecule of claim 1, wherein said 
protein is operably linked to a chloroplast transit peptide-encoding sequence. 

3. The substantially purified nucleic acid molecule of claim 1, wherein said 
nucleic acid molecule comprises a nucleic acid sequence selected from the group consisting 

10 of SEQ ID NOs: 1 through 3, 5 through 47, and complements thereof. 

4. The substantially purified nucleic acid molecule of claim 1, wherein said 
nucleic acid molecule hybridizes under moderate stringency conditions to a nucleic acid 
sequence selected from the group consisting of SEQ ID NOs: 1 through 3, 5 through 47, and 
complements thereof. 

15 5. The substantially purified nucleic acid molecule of claim 1 , wherein said 

nucleic acid molecule has greater than 85% identity to a nucleic acid sequence selected from 
the group consisting of SEQ ID NOs: 1 through 3, 5 through 47, and complements thereof. 

6. A substantially purified nucleic acid molecule that encodes a protein 
comprising an amino acid sequence of SEQ ID NO: 4. 

20 7. The substantially purified nucleic acid molecule of claim 6, wherein 'said 

nucleic acid molecule comprises the nucleic acid sequence of SEQ ID NO: 2. 

8. A substantially purified nucleic acid molecule that encodes a protein 
comprising an amino acid sequence of SEQ ID NO: 48. 

9. The substantially purified nucleic acid molecule of claim 8, wherein said 
25 nucleic acid molecule comprises the nucleic acid sequence of SEQ ID NO: 1 . 

10. A substantially purified nucleic acid molecule that encodes a protein 
comprising an amino acid sequence of SEQ ID NO: 49. 

I I. The substantially purified nucleic acid molecule of claim 10, wherein said 
nucleic acid molecule comprises the nucleic acid sequence of SEQ ID NO: 2. 

30 12. A substantially purified nucleic acid molecule that encodes a protein 

comprising an amino acid sequence of SEQ ID NO: 50. 

13. The substantially purified nucleic acid molecule of claim 12, wherein said 
nucleic acid molecule comprises the nucleic acid sequence of SEQ ID NO: 3. 

14. The substantially purified nucleic acid molecule of claim 13, wherein said 
35 protein is operably linked to a chloroplast transit peptide-encoding sequence. 
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15. A substantially purified nucleic acid molecule that encodes a GCPE protein, 
wherein said nucleic acid molecule comprises a nucleic acid sequence selected from the 
group consisting of SEQ ID NOs: 1 through 3, 5 through 47, and complements thereof. 

16. A recombinant nucleic acid molecule comprising as operably linked 

5 components: (A) a promoter; and (B) a heterologous nucleic acid molecule that encodes an 
amino sequence selected from the group consisting of SEQ ID NOs: 4 and 48 through 50. 

17. The recombinant nucleic acid molecule of claim 16, wherein the promoter is 
a seed-specific promoter. 

1 8. The recombinant nucleic acid molecule of claim 1 7, wherein the seed- 
10 specific promoter is a napin promoter. 

19. A recombinant nucleic acid molecule comprising as operably linked 
components: (A) an exogenous promoter; and (B) a nucleic acid sequence selected from the 
group consisting of SEQ ID NOs: 1 through 3, 5 through 47, and complements thereof. 

20. The recombinant nucleic acid molecule of claim 19, wherein the promoter is 
15 a seed-specific promoter. 

21. The recombinant nucleic acid molecule of claim 20, wherein the seed- 
specific promoter is a napin promoter. 

22. The recombinant nucleic acid molecule of claim 1 9, wherein said nucleic 
acid molecule further comprises a second nucleic acid sequence that encodes at least one 

20 MEP pathway protein. 

23. The recombinant nucleic acid molecule of claim 22, wherein said at least one 
MEP pathway protein comprises a yfgA protein and a yfgB protein. 

24. A recombinant nucleic acid molecule comprising as operably linked 
components: (A) a promoter that functions in a plant cell to cause production of an mRNA 

25 molecule; and (B) a nucleic acid sequence that hybridizes under moderate stringency 

conditions to a nucleic acid sequence selected from the group consisting of SEQ ID NOs: 1 
through 3, 5 through 47, and complements thereof. 

25. A recombinant nucleic acid molecule comprising as operably linked 
components: (A) a promoter that functions in a plant cell to cause production of an mRNA 

30 molecule; and (B) a nucleic acid sequence that has greater than 85% identity to a nucleic acid 
sequence selected from the group consisting of SEQ ID NOs: 1 through 3, 5 through 47, and 
complements thereof. 

26. A transformed cell comprising the recombinant nucleic acid molecule of 
claim 16. 
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27. A transformed cell comprising the recombinant nucleic acid molecule of 
claim 19. 

28. The transformed cell of claim 27, wherein the cell is selected from the group 
consisting of a bacterial cell, a mammalian cell, an insect cell, a plant cell, and a fungal cell. 

5 29. The transformed cell of claim 28, wherein the bacterial cell is Agrobacterium 

twnefaciens. 

30. A substantially purified protein comprising an amino acid sequence selected 
from the group consisting of SEQ ID NOs: 4, 48, and 49. 

31. An antibody capable of specifically binding a protein comprising an amino 
10 acid sequence selected from the group consisting of SEQ ID NOs: 4, 48 and 49. 

32. A transgenic plant comprising the recombinant nucleic acid molecule of 
claim 16. 

33. The transgenic plant of claim 32, wherein said transgenic plant exhibits an 
increased tocopherol level relative to a plant with a similar genetic background but lacking 

1 5 the recombinant nucleic acid molecule. 

34. The transgenic plant of claim 32, wherein said transgenic plant produces a 
seed with an increased tocopherol level relative to a plant with a similar genetic background 
but lacking the recombinant nucleic acid molecule. 

35. The transgenic plant of claim 32, wherein said transgenic plant exhibits an 
20 increased monoterpene level relative to a plant with a similar genetic background but lacking 

the recombinant nucleic acid molecule. 

36. The transgenic plant of claim 32, wherein said transgenic plant produces a 
seed with an increased monoterpene level relative to a plant with a similar genetic 
background but lacking the recombinant nucleic acid molecule. 

25 37. The transgenic plant of claim 32, wherein said transgenic plant exhibits an 

increased carotenoid level relative to a plant with a similar genetic background but lacking 
the recombinant nucleic acid molecule. 

38. The transgenic plant of claim 32, wherein said transgenic plant produces a 
seed with an increased carotenoid level relative to a plant with a similar genetic background 

30 but lacking the recombinant nucleic acid molecule. 

39. The transgenic plant of claim 32, wherein said transgenic plant exhibits an 
increased tocotrienol level relative to a plant with a similar genetic background but lacking 
the recombinant nucleic acid molecule. 
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40. The transgenic plant of claim 32, wherein said transgenic plant produces a 
seed with an increased tocotrienol level relative to a plant with a similar genetic background 
but lacking the recombinant nucleic acid molecule. 

41. A transgenic plant comprising the recombinant nucleic acid molecule of 

s 

5 claim 19. 

42. The transgenic plant of claim 41, wherein said transgenic plant is selected 

* 

from the group consisting of Brassica campestris, Brassica napus, canola, castor bean, 
coconut, cotton, crambe, linseed, maize, mustard, oil palm, peanut, rapeseed, rice, safflower, 
sesame, soybean, sunflower, and wheat. 
10 43 . The transgenic plant of claim 4 1 , wherein said transgenic plant is selected 

from the group consisting of coconut, crambe, maize, oil palm, peanut, rapeseed, safflower, 
sesame, soybean, and sunflower. 

44. A transgenic plant comprising a nucleic acid molecule that encodes a GCPE 
protein, wherein said nucleic acid molecule comprises a promoter operably linked to a 

15 heterologous nucleic acid sequence selected from the group consisting of SEQ ID NOs: 1 
through 3, 5 through 47, and complements thereof. 

45 . The transgenic plant of claim 44, wherein the plant exhibits an increased 
isoprenoid compound level relative to a plant with a similar genetic background but lacking 
the heterologous nucleic acid sequence. 

20 46. The transgenic plant of claim 45, wherein the isoprenoid compound is 

selected from the group consisting of tocotrienols, tocopherols, terpenes, gibberellins, 
carotenoids, and xanthophylls. 

47. The transgenic plant of claim 45, wherein the isoprenoid compound is a 
monoterpene. 

25 48. The transgenic plant of claim 45, wherein the isoprenoid compound is 

selected from the group consisting of IPP and DMAPP. 

49. The transgenic plant of claim 44, wherein the plant exhibits an increased 
tocopherol level relative to a plant with a similar genetic background but lacking the 
heterologous nucleic acid sequence. 

30 50. The transgenic plant of claim 44, wherein the promoter is .a seed-specific 

* 

promoter. 

5 1 . The transgenic plant of claim 50, wherein the seed-specific promoter is 
selected from the group consisting of napin, phaseolin, zein, soybean trypsin inhibitor, ACP, 
stearoyl-ACP desaturase, soybean a' subunit of b-conglycinin (soy 7s), and oleosin 
35 promoters. 
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52. Seed derived from a transgenic plant of claim 41. 

53. The seed of claim 52, wherein said seed exhibits an increased tocopherol 
level relative to seed from a plant having a similar genetic background but lacking said 
recombinant nucleic acid molecule. 

5 54. The seed of claim 52, wherein said seed exhibits an increased -tocopherol 

level relative to seed from a plant haying a similar genetic background but lacking said 
recombinant nucleic acid molecule. 

55. The seed of claim 52, wherein said seed exhibits an increased monoterpene 
level relative to seed from a plant having a similar genetic background but lacking said 

1 0 recombinant nucleic acid molecule. 

56. The seed of claim 52, wherein said seed exhibits an increased carotenoid 
level relative to seed from a plant having a similar genetic background but lacking said 
recombinant nucleic acid molecule. 

57. The seed of claim 52, wherein said seed exhibits an increased tocotrienol 
15 level relative to seed from a plant having a similar genetic background but lacking said 

recombinant nucleic acid molecule. 

58. Oil derived from the seed of claim 52. 

59. Meal derived from the seed of claim 52. 

60. Seed derived from a transgenic plant of claim 34. 
20 61. Oil derived from the seed of claim 60. 

62. The oil of claim 61, wherein said oil is produced in a volume greater than 
one liter. 

63. The oil of claim 62, wherein said oil is produced in a volume greater than ten 

liters. 

25 64. A container of seeds, wherein at least 25% of the seeds are derived from a 

transgenic plant of claim 46. 

65. The container of seeds of claim 64, wherein the container contains more than 
2500 seeds. 

66. Feedstock derived from a transgenic plant of claim 45 . 
30 67. A plant part derived from a transgenic plant of claim 45. 

68. The plant part of claim 67, wherein the plant part is a seed. 

69. The plant part of claim 67, wherein the plant part is a fruit. 

70. A method of producing a transgenic plant with an increased isoprenoid 
compound level comprising: (A) transforming the plant with a nucleic acid molecule to 

35 produce a transgenic plant, wherein the nucleic acid molecule comprises a nucleic acid 
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sequence selected from the group consisting of SEQ ID NOs: 1 through 3, 5 through 47, and 
complements thereof; and (B) growing the transgenic plant. 

71 . The method of claim 70, wherein the plant is selected from the group 
consisting of Brassica campestris, Brassica napus, canola, castor bean, coconut, cotton, 

5 crambe, linseed, maize, mustard, oil palm, peanut, rapeseed, rice, safflower, sesame, 
soybean, sunflower, and wheat. 

72. The method of claim 70, wherein seed of the transgenic plant exhibits an 
increased isoprenoid compound level. 

73 . The method of claim 72, wherein seed of the transgenic plant exhibits an 
10 increased tocopherol level. 

74. A method of producing a transgenic plant having seed with an increased 
isoprenoid compound level comprising: (A) transforming the plant with a nucleic acid 
molecule to produce a transgenic plant, wherein the nucleic acid molecule encodes a protein 
with an amino acid sequence selected from the group consisting of SEQ ID NOs: 4 and 48- 

1 5 50; and (B) growing the transgenic plant. 

75. The method of claim 74, wherein the plant is selected from the group 
consisting of Brassica campestris, Brassica napus, canola, castor bean, coconut, cotton, 
crambe, linseed, maize, mustard, oil palm, peanut, rapeseed, rice, safflower,, sesame, 
soybean, sunflower, and wheat. 

20 76. The method of claim 74, wherein seed of the transgenic plant exhibits an 

increased tocopherol level. 
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FIGURE 2 




FIGURE 3 
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Venkatesh, Tyamagondlu V. ; 
Venkatramesh, Mylavarapu 

<120> Methyl-D-Erythritol Phosphate Pathway Genes 

<13 0> 16516 .108/3 5-21 (51897) PCT 

<150> US 60/223,483 

<151> 2000-08-07 

<160> 85 

<210> 1 

<211> 2520 

<212> DNA 

<213> Arabidopsis thaliana 
<220> 

<221> CDS 

<222> (154) . . (2376) 

<400> 1 

aaaaatcgtc aatccctctc aaactcttct caccactaat ttcttcctct ggaacattct 60 

cttctctatt attttgattc ccttggcctc aacactggtt tctcaattgc atgatcttgg 120 

ctcgtcttca gttactttga ttcactgaga aaa atg gcg act gga gta ttg cca 174 

Met Ala Thr Gly Val Leu Pro 
1 5 

get ccg gtt tct ggg ate aag ata ccg gat teg aaa gtc ggg ttt ggt 222 
Ala Pro Val Ser Gly lie Lys lie Pro Asp Ser Lys Val Gly Phe Gly 
10 15 20 

aaa age atg aat ctt gtg aga att tgt gat gtt agg agt eta aga tct 270 
Lys Ser Met Asn Leu Val Arg lie Cys Asp Val Arg Ser Leu Arg Ser 
25 30 35 

get agg aga aga gtt teg gtt ate egg aat tea aac caa ggc tct gat 318 
Ala Arg Arg Arg Val Ser Val lie Arg Asn Ser Asn Gin Gly Ser Asp 
40* 45 50 55 

tta get gag ctt caa cct gca tec gaa gga age cct etc tta gtg cca 3 66 

Leu Ala Glu Leu Gin Pro Ala Ser Glu Gly Ser Pro Leu Leu Val Pro 

60 65 70 

aga cag aaa tat tgt gaa tea ttg cat aag acg gtg aga agg aag act 414 
Arg Gin Lys Tyr Cys Glu Ser Leu His Lys Thr Val Arg Arg Lys Thr _ , 

75 80 85 

cgt act gtt atg gtt gga aat gtc gec ctt gga age gaa cat ccg ata 462 
Arg Thr Val Met Val Gly Asn Val Ala Leu Gly Ser Glu His Pro lie 
90 95 100 

agg att caa acg atg act act teg gat aca aaa gat att act gga act 510 
Arg lie Gin Thr Met Thr Thr Ser Asp Thr Lys Asp lie Thr Gly Thr 
105 110 115 



1 



WO 02/12478 



PCT7US0 1/24335 



gtt gat gag 
Val Asp Glu 
120 

ata act gtt 
lie Thr Val 



gat aaa etc 
Asp Lys Leu 



cat ttt gec 
His Phe Ala 
170 

ate cgt gtc 
lie Arg Val 
185 

acg ata gat 
Thr lie Asp 
200 

gag cag gtc 
Glu Gin Val 



gca atg cgt 
Ala Met Arg 



age tat tac 
Ser Tyr Tyr 
250 



gtt atg aga 
Val Met Arg 
125 

caa ggg aag 
Gin Gly Lys 
14 0 

gtt cag ctt 

Val Gin Leu 
155 

cct act gta 

Pro Thr Val 



aac cca gga 
Asn Pro Gly 



tat aca gaa 
Tyr Thr Glu 
205 

ttc act cct 
Phe Thr Pro 
220 

att ggg aca 
lie Gly Thr 
235 

ggg gat tct 
Gly Asp Ser 



ata gcg gat 
lie Ala Asp 



aaa gag gcg 
Lys Glu Ala 



aat tac aat 
Asn Tyr Asn 
160 

gec tta cga 
Ala Leu Arg 
175 

aat ttt gcg 
Asn Phe Ala 
190 

gat gaa tat 
Asp Glu Tyr 



ttg gtt gag 
Leu Val Glu 



aat cat gga 
Asn His Gly 
240 

ccc cga gga 
Pro Arg Gly 
255 



aaa gga get 
Lys Gly Ala 
130 

gat gcg tgc 
Asp Ala Cys 
145 

ata ccg ctg 
lie Pro Leu 



gtc get gaa 
Val Ala Glu 



gae agg egg 
Asp Arg Arg 
195 

cag aaa gaa 
Gin Lys Glu 
210 

aaa tgc aaa 
Lys Cys Lys 
225 

agt ctt tct 
Ser Leu Ser 



atg gtt gaa 
Met Val Glu 



gat att gta 
Asp lie Val 



ttt gaa ata 
Phe Glu lie 
150 

gtt gca gat 
Val Ala Asp 
165 

tgc ttt gae 
Cys Phe Asp 
180 

gec cag ttt 
Ala Gin Phe 



etc cag cat 
Leu Gin His 



aag tac ggg 
Lys Tyr Gly 
230 

gae cgt ate 
Asp Arg lie 
245 

tct gcg ttt 
Ser Ala Phe 
260 



agg 558 

Arg 

135 

aaa 606 
Lys 



att 654 
lie 



aag 7 02 

Lys 



gag 75 0 

Glu 



ate 798 

lie 

215 

aga 84 6 

Arg 



atg 894 
Met 



gag 942 
Glu 



ttt gca aga ata tgt egg aaa tta gae tat cac aac ttt gtt ttc tea 990 

Phe Ala Arg lie Cys Arg Lys Leu Asp Tyr His Asn Phe Val Phe Ser 

265 270 275 

atg aaa gcg age aac cca gtg ate atg gtc cag gcg tac cgt tta ctt 1038 

Met Lys Ala Ser Asn Pro Val lie Met Val Gin Ala Tyr Arg Leu Leu 

280 285 290 295 

gtg get gag atg tat gtt cat gga tgg gat tat cct ttg cat ttg gga 1086 

Val Ala Glu Met Tyr Val His Gly Trp Asp Tyr Pro Leu His Leu Gly 

300 305 310 

gtt act gag gca gga gaa ggc gaa gat gga egg atg aaa tct gcg att 1134 

Val Thr Glu Ala Gly Glu Gly Glu Asp Gly Arg Met Lys Ser Ala lie 

315 320 325 

gga att ggg acg ctt ctt cag gae ggg etc ggt gae aca ata aga gtt 1182 

Gly He "Gly Thr Leu- Leu .Gin Asp Gly Leu Gly Asp Thr He Arg Val 

330 335 340- . 

tea ctg acg gag cca cca gaa gag gag ata gat ccc tgc agg cga ttg 123 0 

Ser Leu Thr Glu Pro Pro Glu Glu Glu He Asp Pro Cys Arg Arg Leu 

345 350 355 

get aac etc ggg aca aaa get gec aaa ctt caa caa ggc gca ccg ttt 12 78 

Ala Asn Leu Gly Thr Lys Ala Ala Lys Leu Gin Gin Gly Ala Pro Phe 

360 365 370 375 



2 



WO 02/12478 



PCT7US0 1/24335 



gaa gaa aag cat agg cat tac ttt gat ttt cag cgt egg acg ggt gat 13 26 
Glu Glu Lys His Arg His Tyr Phe Asp Phe Gin Arg Arg Thr Gly Asp 

380 385 390 

eta cct gta caa aaa gag gga gaa gag gtt gat tac aga aat gtc ctt 13 74 
Leu Pro Val Gin Lys Glu Gly Glu Glu Val Asp Tyr Arg Asn Val Leu 

395 400 405 

cac cgt gat ggt tct gtt ctg atg teg att tct ctg gat caa eta aag 1422 
His Arg Asp Gly Ser Val Leu Met Ser lie Ser Leu Asp Gin Leu Lys 
410 415 420 

gca cct gaa etc etc tac aga tea etc get aca aag ctt gtc gtg ggt 1470 
Ala Pro Glu Leu Leu Tyr Arg Ser Leu Ala Thr Lys Leu Val Val Gly 
425 430 435 

atg cca ttc aag gat ctg gca act gtt gat tea ate tta tta aga gag 1518 
Met Pro Phe Lys Asp Leu Ala Thr Val Asp Ser lie Leu Leu Arg Glu 
440 445 450 455 

eta ccg cct gta gat gat caa gtg get cgt ttg get eta aaa egg ttg 1566 
Leu Pro Pro Val Asp Asp Gin Val Ala Arg Leu Ala Leu Lys Arg Leu 

460 465 470 

att gat gtc agt atg gga gtt ata gca cct tta tea gag caa eta aca 1614 
lie Asp Val Ser Met Gly Val lie Ala Pro Leu Ser Glu Gin Leu Thr 

475 480 485 

aag cca ttg ccc aat gee atg gtt ctt gtc aac etc aag gaa eta tct 1662 
Lys Pro Leu Pro Asn Ala Met Val Leu Val Asn Leu Lys Glu Leu Ser 
490 495 500 

ggt ggc get tac aag ctt etc cct gaa ggt aca cgc ttg gtt gtc tct 1710 
Gly Gly Ala Tyr Lys Leu Leu Pro Glu Gly Thr Arg Leu Val Val Ser 
505 510 515 

eta cga ggc gat gag cct tac gag gag ctt gaa ata etc aaa aac att 17 58 
Leu Arg Gly Asp Glu Pro Tyr Glu Glu Leu Glu lie Leu Lys Asn lie 
520 525 530 535 

gat get act atg att etc cat gat gta cct ttc act gaa gac aaa gtt 18 06 
Asp Ala Thr Met lie Leu His Asp Val Pro Phe Thr Glu Asp Lys Val 

540 545 550 

age aga gta cat gca get egg agg eta ttc gag ttc tta tec gag aat 1854 
Ser Arg Val His Ala Ala Arg Arg Leu Phe Glu Phe Leu Ser Glu Asn 

555 560 565 

tea gtt aac ttt cct gtt att cat cac ata aac ttc cca ace gga ate 1902 
Ser Val Asn Phe Pro Val lie His His lie Asn Phe Pro Thr Gly lie 
570 575 580 

cac aga gac gaa ttg gtg att cat gca ggg aca tat get gga ggc ctt 1950 
His Arg Asp Glu Leu Val lie His Ala Gly Thr Tyr Ala Gly Gly Leu 

58 5 " 59 0 ' 595 

ctt gtg gat gga eta ggt gat ggc gta atg etc gaa gca cct gac caa 1998 
Leu Val Asp Gly Leu Gly Asp Gly Val Met Leu Glu Ala Pro Asp Gin 
600 605 610 * 615 

gat ttt gat ttt ctt agg aat act tec ttc aac tta tta caa gga tgc 2046 
Asp Phe Asp Phe Leu Arg Asn Thr Ser Phe Asn Leu Leu Gin Gly Cys 

620 625 630 
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aga atg cgt aac act aag acg gaa tat gta teg tgc ccg tct tgt gga 2 094 

Arg Met Arg Asn Thr Lys Thr Glu Tyr Val Ser Cys Pro Ser Cys Gly 

635 640 645 

aga acg ctt ttc gac ttg caa gaa ate age gee gag ate cga gaa aag 2142 

Arg Thr Leu Phe Asp Leu Gin Glu lie Ser Ala Glu lie Arg Glu Lys 
650 655 660 

act tec cat tta cct ggc gtt teg ate gca ate atg gga tgc att gtg 2190 

Thr Ser His Leu Pro Gly Val Ser lie Ala lie Met Gly Cys lie Val 
665 670 675 

aat gga cca gga gaa atg gca gat get gat ttc gga tat gta ggt ggt 22 3 8 

Asn Gly Pro Gly Glu Met Ala Asp Ala Asp Phe Gly Tyr Val Gly Gly 
680 685 690 695 

tct ccc gga aaa ate gac ctt tat gtc gga aag acg gtg gtg aag cgt 22 86 

Ser Pro Gly Lys He Asp Leu Tyr Val Gly Lys Thr Val Val Lys Arg 

700 705 710 

ggg ata get atg acg gag gca aca gat get ctg ate ggt ctg ate aaa 2334 

Gly He Ala Met Thr Glu Ala Thr Asp Ala Leu He Gly Leu He Lys 

715 720 725 

gaa cat ggt cgt tgg gtc gac ccg ccc gtg get gat gag tag 23 76 

Glu His Gly Arg Trp Val Asp Pro Pro Val Ala Asp Glu 
730 735 740 

atttcaaaac ggagaaagat gggtgggcca ttctttgaaa actgtgagag aagatatata 2436 

tatttgtgtg tgtatatcat ctgtttgttg tgtattgeat catcattttg aacaaatgtc 2496 

caaatctctt aagttgataa aagt 252 0 



<210> 2 

<211> 33675 

<212> DNA 

<213> Oryza sativa 



<220> 

<221> CDS 

<222> (6924) . . (7019) , (7163) . . (7269) , (7344) . . (7444) , (7525) . . (7634) , 

<222> (7694) . . (7813) , (7923) . . (8153) , (8253) . . (8369) , (8515) . . (8589) , 

<222> (9012) . . (9071) , (9163) . . (9225) , (9328) . - (9472) , (9589) . . (9730) , 

<222> (9951) . . (10028) , (10134) . . (10293) , (10694) . . (10798) , 

<222> (11028) . . (11129) 

<220> 

<221> unsure 

<222> (1. .33675) 

<223> unsure at all n locations 

<400> 2 

cttaaccctc gccgactgcc tggagattcg tgecgatega tacacgtggc agcgcctaac 60 

gcgtaacccc tccctcactt ggagattcgt gcaagcaact egattaatge attaatgetg 120 

tegegtaggt ttccctacgg aagagctgag tttcgtaacg aaaaaaaccg gccacgtttc 180 

gcatcgagcc tactttaatt agcgtgggaa aataattcaa agtagegace tgtaccctgt 240 
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ggcaacctag 


cgcgcgcggc 


catggctctt 


gttccgctcg 


tgacagtgct 


.cctgttcgcc 


300 


ggctcatgcc 


tcggatcagc 


gccgccgacg 


acatcgccgg 


cggcgtcggc 


ggcgtccacg 


360 


gcgacacgta 


cggtagtagt 


cgacggcatt 


acggccatct 


acaaacctcg 


gcgactcgct 


420 


gtcggacacc 


gcaacctcgc 


caggcaaggc 


gccaccggcg 


ggctgctccg 


gtacaccacg 


480 


aggcttccct 


acggcgtcac 


cgtcggccgc 


gccaccggcc 


ggtgctccga 


cggctacctc 


540 


atcatcgact 


tcctcggtga 


cgtcatcagt 


ttaatttctc 


tctctcttcc 


gtctgaaaaa 


600 


tggaagaaac 


aatattatat 


tacgttatat 


atatatgcgt 


ttttgtttcg 


gattaaattg 


660 


tggatatgat 


cgatcgatgt 


gcagctagag 


atcttggcct 


ccctctgctc 


aacccgtacc 


720 


tcgacgaggg 


cgcggactt c 


gcccacggcg 


tcaacttcgc 


cgtcgccggc 


gccaccgcgc 


780 


tcaacacgac 


ggcgctcgcc 


gccaggcgga 


tcaccgtccc 


ccacaccaac 


agccccctcg 


840 


acgtgcagct 


cagatttttt 


ttgttttaga 


gaagggtatt 


ttttacccgg 


cctctacatc 


900 


caaccggata 


tatacggcta 


ttgaagtagg 


gaacttaacc 


ctgtaaacaa 


tccatccata 


960 


gaggatatga 


acctaagacc 


ttgaggfcacfc 


acttcaaccg 


gatatatacg 


tgcagctcag 


1020 


atggttcaag 


gaattcatga 


actccacaac 


tagttctcct 


caaggtgaac 


gaacaaactg 


1080 


aaacgcattfc 


cagcttaatt 


tcgaccggtg 


cctgatcagt 


gccagtcagc 


aatgctgtat 


1140 


ctcacaaata 


attaagctaa 


tgtacagctt 


ttcagtgcta 


gaatgacttt 


catatagaga 


1200 


aatcttgtgt 


tatafcatata 


tacttttttc 


tgaaagaaaa 


aagttctttt 


gtgtgagcat 


1260 


tgcattgcag 


agatccgtga 


aaagctgtcg 


aagtcactgg 


ttatgctggg 


agagatcgga 


1320 


ggaaacgact 


acaactacgc 


cttcctccag 


acctggccga 


tggacggtgg 


atacagcctc 


1380 


ggcaacgtca 


cacgcatgat 


cgaaagcgtt 


gccacggccg- 


tcgatcttgt 


accggaagtc 


1440 


gtgcagtcca 


tagccagcgc 


agccaaggta 


cacaccattc 


ttttccatta 


afcttttggga 


1500 


ccttattttt 


aaaataataa 


tcctggctac 


aaagtaatta 


attaagaact 


aaattaat tt 


1560 


ttgtgggttt 


tgtgacacag 


gaggtgctcg 


acatgggcgc 


gacgcgggtg 


gtgatcccgg 


1620 


gcaacctccc 


gctgggttgc 


gtgccgagct 


acatgagcgc 


ggtgaacgcg 


acggaccggg 


1680 


cggcgtacga 


cgcccgcgga 


tgcctcgtcg 


cgctcaacct 


cttcgcggcg 


ctgcacaacg 


1740 


cgtggctgcg 


ccgcgccgtc 


gggga-gctgc 


ggcgcgcgta 


ccggggcgcc 


gcggtggtcg 


1800 


cgtacgcgga 


ctactccgcc 


gcgtacgccg 


cgacgctgga 


cggggcagcg 


gcgctcggct 


1860 


tcgacgagcg 


gcgcgtgttc 


agggcgtgct 


gcggcaaggg 


cggcgggggc 


gcgtacgggt , 


1920 


tcgacgtgcg 


cgcgatgtgc 

- — > — » ■ — * — * 


ggcgcgccgg 


ggacggcggc 


gtgcgcqqac 


ccqqgqaqat 


1980 


acgtgagctg 


ggacggcgtc 


cacctgacgc 


agcgcgcgta 


cggcgtcatg 


gccgagctgc 


2040 


tgttccgccg 


tggcctcgtg 


cacccgcctc 


cgataaattt 


cacgaacagc 


gcgcgcgcgt 


2100 


gaggcggtgt 


tgcatggctt 


gcgcgttttt 


tctgatcaaa 


actactcaag 


tttgagccgt 


2160 
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tttgatttat 


aaataaaacc 


atatgcgatt 


ttgctaaacg 


tttgtcgcgt 


gatttctctt 


2220 


cggaagaaaa 


aatctcaccc 


gagtgatgca 


taggcggtcc 


caaccatatg 


tgccctgacc 


2280 


tttctctgct 


tccttcgcgt 


cgtgcactga 


caacctcaca 


gtatgttttt 


ggtatgggcg 


2340 


cttgcggccc 


aactcaatct 


gtaatacatt 


gggctgtcgt 


attgggtttg 


ttggacttca 


2400 


tagactggat 


cggagaaagt 


tgggtaattg 


actttttcat 


ttttgctata 


aaatgattaa 


2460 


ttaaacagtc 


taggafcaatt 


actgtagact 


ctaataatat 


tgtttggtta 


agtattatta 


2520 


tacattcctg 


tatttgacac 


tctaagagca 


tggccaagag 


ttgcctgaaa 


gtctcttcct 


2580 


aaatctgcct 


ttcattctct 


aatgagaatt 


taaggattaa 


aaatatactt 


attttcaata 


2640 


gacagcataa 


atttaattcc 


ctagaataaa 


aaaatgcccc 


cctaacaaca 


gaaattagat 


2700 


tcctctaccc 


gcacctcatc 


agatcgctcg 


att taagatc 


acgccatctg 


acaccgccct 


2760 


cccgctcgct 


cttctctagt 


gtgggagtct 


cgcgctcaag 


agacggaaat 


cgggaacaag 


2820 


aatgattcct 


agcttagcga 


gaatgaaggg 


gaagacatat 


gfccataccta 


cacccacata 


2880 


agtatgccct 


agcacaaggg 


atgaaaacgg 


atcgaaaacg 


gatggaaact 


agctttatca 


2940 


tattcgtttt 


catttttttt 


tcggaatcgg 


attcgaaatc 


gaaaactcgg 


atacggaaat 


3000 


aaaattgaat 


attatcgaat 


acagatacgg 


agcgaatata 


agatggaacg 


aatacagtag 


3060 


cgaatattta 


ccggtatata 


aaaaacccct 


caaattgagt 


ttcttgatta 


agaaagagat 


3120 


atcgcttatt 


attttagtta 


aatatctcca 


acatt tatat 


cgt caatttt 


atagacggtt 


3180 


ccacaatcgt 


atgtgaaaat 


cgattttcat 


ggttgttcct 


ctaagagatc 


catatgcaaa 


3240 


tatgattatc 


attttctatt 


ccaagacctt 


ttactagatg 


tataacttat 


ttaccattgc 


3300 


ataaattgga 


gatgttattt 


attttacttc 


acatcttcga 


aacttgtaat 


gtatgtatta 


3360 


tactttaaat 


gctttcaagt 


acaaatgtta 


taaactacaa 


agtggtagat 


cccgttgagc 


3420 


tctacaactt 


tgatatggaa 


cacatctcca 


tcagatgtcg 


tttgaattgt 


agatctgaga 


3480 


ttttgtaaaa 


tttaatatgg 


tatattataa 


tgaatattta 


gacccttaaa 


tgaccttaaa 


3540 


taataaaata 


gtcaataata 


aagttgtaga 


tctcatcgag 


ctctataatg 


ttgatatgaa 


3600 


gtttgtcttc 


atctgattcc 


gtatgaaaaa 


gttatgtata 


tatacatgtt 


tttttataaa 


3660 


atttgctcaa 


tatctgcgga 


tatccgaaaa 


aaatttcgga 


tagtttttaa 


ccgtttttcg 


3720 


attccgatgg 


atagtatcct 


tactgtattc 


gttttcgttt 


ccgagaaaaa 


atatccaaat 


3780 


tcgtttccga 


atccgagaat 


ttttggataa 


ttccgacaga 


aactatccga 


atccgaaaaa 


3840 


^-£f9^ L-Cy^a c 


gga cgga aa c 




ague u catcc 


cggctzagcac 


gcactiaaaL 


-D o r\ r\ 


tcacatgagg 


ttgcacattt 


atctgaggta 


aaaagattgg 


aaacggttac 


tggttcgtca 


3960 


agaattttcc 


gtatttatca 


gtataactat 


tcaatgacga 


catcaacata 


acagaaaatt 

* 


4020 


aaaacaacat 


gagtcgattt 


tatatataac 


tagaaacgaa 


aacagtataa 


ctgttacgaa 


4080 
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aacactaaat 


t Q a t crcrcr t c cr 


aaaatttcca 


ccaccrcft ttt 


tatgcctacc 


t ttcaacget 


4140 


cccaaaattc 

> ■* s — * v — ■ Vp*v wv l^Vj 1 l* l« ^ 


ccaccraccca 


aaaca tcrt crt 


aQQacraactt 


cccrcccacat 


qqaaacaatt 


4200 


q t c t c a aaaa 


aaccrtcrccat 


ctget ttget 


ccaggtcaac 


a cat at crater 


tgactgaact 


4260 


aaccat catc 

T__J 7*3 ^""^ " LA L* L-a ^—J La l^ 


t caat at tcrt 


cat ctaccccr 


tcataccata 


ccaccaaacc 


aaaaaataat 


4320 


t a t aat cttc 

L^ LA L^ L* \^ La* L-* 


acr c crcr c c crfc c 


cr c cr c cr c crcra t 


accttcrct cc 


acaacaaqt c 


acTcccrctcaa 


4380 


arracaot - rp 


cpt - ttcfocat 

Vh» \*0 L^i l_- L_- Ljj ^4 LA L-t 


tcraacatcracr 

L— ^-i LA LA L_> LA V-* Lj LA M 


crt t tcraccrac 

La V— ' La LA L-* LA L_a< 


aatatatat a 


tatatttaoa 

Ct««. t-^-J C C C^j^-J^J 


444 0 


cacrcri - acre fc fc 


t~ cri~ 1 1 caacrc 

L-- 4 U«« L-r L_^ "w LA LAM ^ — ■ 


tcrca.pt acre t 

L-* Lo LA L-»- L— LA v — J L_^ 


aat taacrat c 

LA LA L^ L^ \A LA ^ — 1 wL U> 


aat ctcc tta 

^-A LA Lp» L— L* L— wp L— j 


t c a aaat cac 

L- L LA LA LA L>] \^ L LA L> 


450 0 


a a hraaaraf 


cap} Pi Pi crt" apa 

^ y OL d CI ^-j C Ct ^ Ct 


t~ crca t" crcra acr 


aaa tcrt tcraa 

LA LA LA L-a L-al La La) Lj LA LA 


at at aat aaa 

CL C* L-* •_ I Ct C Ct t-X 


ct aaat aat a 

C C Ct • C *— 1 Ct > — ' N-^ 


4 560 

-t — ^ LV L/ 


LLLLULLL.LI 


l l '• ^ L- LO.L U-Ctct 


di^cLcti_ctU-t— ctd. 


y L L LULL L LCI 


h \~ •}- nfaaarra 

C C» . L. ClCtCiy Ct 


Ct U. y LCtClL-ClL. 


l3 ^ LJ 


LLcl-L.c>c>Clcc. 






ar*i~c , cr , t~al~cr 
cl v_. i_. cl w y 


crt 1 1- r r cri~ t cr 


t"t"tacrcat" ci~ 

L- Lp C CtCJ LCILL.L< 


4 6R D 


t aac 1 1 1 caa 

L_— V-j LA L-* La La 1 — ■ V_* V— j >— 4 


crca fcacat h t 

LA. La LA L^tf > — j L^ L^ L^ 


tatcrat t tat 

L->r LA I « LA L_# L« LA ' ~ 


ct t at tacaa 

L^ L^ L^ LA L^ L^ L*L. L^ L^L LA 


aat ataatta 

LA LA. L^ LA L^ L>L LA L^ L^ wL 


tcatttattt 

L^ L-a LA La L^ L^ LA L^ L^ L^ 


4740 


tat catfara 

Lj LA L_. v > — ' LA L- L- L_A L-> LA 


aat"actttaa 

LA LA LA V_* L-* w I_h» LA LA 


aaat aacatt 

LA LA LA LA LA V— < LA L- L* 


at cacrc tcra t 

LA Lat L* LA^*j L-a La | LA La 


t ttaaat taa 

L- La* L** ^1 LA LA L* La L-L LA 


aac taa aat t 

LA LA L-a a LL LA LA LA L* La 


4800 

A L" Lp/ Lp* 


acacctfaat 

LA V-** LA L-» La- L-. LA LA L- • 


tacaatatac 

L_s LA Lrf* LA LA Lw LA LA V-* 


t tea cat acre 

L_tf l — \_* LA L^* LA La* LA 


aat tat aat a 

LA LA La La LA L* LA LA La* LA 


t aac tat at a 

L* LA LA Vp_p* Ly LA L* LA L* LA 


caact tacac 

L-» LA LA L-a La La LA La" LA L 


4860 


tat aaat tat 

V. ' LaL L* LL V1> ^ L-> La LA L* 


crt tcaaaata 

^"1 L-» L_* LA LA LA LA Lj*> 


1 1 tt t cct ac 


aaaaactat c 

VaV X_a^ Ipaa'h V-^V ^p» p* ^pm* tp^ \^ 


accaaat t ct 


t aaa caat cc 

V-* ^pa"p " J V>*^ Vp' L^**i ^*aPJ C^ ^w-a 


4920 


catfcccacca 

V,-- VA L— ■ L* O L> LA La* L_a LA 


cct cacrctcrc 


ccrtaaaacraa 

L_-a ~1 L-# LA LA LA > -^| LA LA 


ctt tcr crate t 

L-» La" La" La»f ^-n ~J ~3 L*» La» L-» 


taa at aaat c 

La LA LA LA L^ L*L L>L L^ >— ^ 


caaat t tat c 

' LA LV LA L-a La L» LA L^ L-a 


4980 


tttttcrtttt 

La» L- L* L»- La* > *— *J L>* L* V— » L»» 


c t caataaaa 

^ ■ LA L>^ L>* LA LA LA 


t a tt ccraatt 

l— * LA L-* L-* ^ LA LA W* W 


at ccaacaaa 

L* L^ > a N a »_av V_»V a t.r> VmC LaL 


t caaaaaaaa 

L* Li L^L LL 1 ~ j LL LaV LaV 


aacatccttc 

wV Vp* *^-a^ L L-» L-* L* L» L-a 


,5040 


era t a a c c c a t 

v — j LA L> L»j LA L-* L— V_* LA L* 


craat at t cat 

x — 1 LA LA L-* LA w L* L— ' n *— » 


craacrt tt etc 

LA LA la L- w L-r L-* L-# V— «■ 


ct c taaccaa 

L>a La> L-^ La LA ^— J La* L-* LA ^-1 


t aacaataca 

La LA LA La* LA LA La LA L-* ^aj 


aaacaatcaa 

LA LA L-a LA LA L-> La^ LA 


5100 


* 

acaatttfcat 

LA 'W LAVA L* W ^ — 1 — - LA V — * 


ctcfcrct caacr 

>— * Lp^ Vh» Lh" *— • LA <rA. ~n 


caccatctct 

v — ' LA L> LA L* V W V-** L«< 


ccrcaccacrat 

^ j ^a* L^> Vp^ ^ LA ^— *J Li L^ 


taaactattt 

Crf LA L^V LL La Va*L L* C» ^-a 


tt tt t teat a 

L^ L^ L* L^ V% L» 


5160 


at acaataca 

1 La UL >w" LA LA- O LA L_a LA 


atcccatcrcc 

LA L^ L™" V— * V—* LA. L*» ^-J L-*» 


crcrccaccraaa 

N_^» L— n LA V— " LA LA LA 


aac aaat aac 


aaaaataat a 


aacaaacaaa 


5220 


acaacct* etc 

LA L-a (A ^— ] Lp* V— • L*» Lhp* L* L-* 


t c cat cat era 

l*» L— * LA L-* V^J L-» LA 


act aat aaaa 

LA L-» L_* LA LA La LA LA LA LA 


aat aaaa t aa 

LA LA L« LA LA LA LA La LA LA 


aaacaaaaca 

LA LA LA La* 1 LA LA LA LA L-a LL 


aaat aat aat 

LA C\ LaL Lp-» LA La LA LA La 


5280 

»— a W L* 


crcra 3 t i~ .a ccra 


pi err 1 cr c a t crcrcr 


aaaaraacaa 

LA LA LA LA V-* L-j La L»» M LJ 


acacaat t a a 

L-J L^p LA LyM LA La L< LA LA 


a teat aacaa 


aaaaaacc ca 


5340 

■ — ' —J A L/ 


^— 1 LA LA L>* S_» L-a CA L**> 1 — < 


t ccacacct c 

v_- L_^ L— ' LA L— c LA L— c L— <■ L>« L— * 


caac eeeaecr 

>-* LA LA Ly L-a» L-a. L_a> LA L-»^-j 


ccat caac ct 

L-a L ^-q La* La" LA Lpp» L-a, L< 


tcccctccca 

La L-r W Vp— * L-> L- La» L L» LA 


tcrcacccaat 

L» ^-*J L «» LA L-* L-a L^» ^— 1 ^— ^ La 


5400 

■ — ■ L» L» 


Vh* l-» la l-» L-« LA La L-» LA V— 


ctcatctctt 


crcra ccccpicpl 

LA L_a* L«-» L» L LA V—* LA 


ccrcacfccact 

La* N — *J L* LA L> L-« LA L-a Lv 


acccacaaca 

* L^. V_a» La* LA L-* L-»* LA 


acacaat act 


5460 


ccrtcf caccaa 

L-a L» LA L-a- L-" ^—fr LA 


cr t ccacaccra 


ccrccrcccrccrc 


aat acaaaaa 


cac caac c t c 

La* L-a» ^-<- " — 1 * M j ^a» Vp^ Lap 


taaaaa taaa 


552 0 


taaact aatc 

L* ^— *J W > V— ^ LA LA L«* 


c crcrt acraaacr 


cccaccactc 

' ■ L_a* LA L^ L»a* LA L^ L-* * 


act cac caat 


tcatcat cct 


ct t caccaaa 

L^ L^ n L^ L^ VaV ^ -1 


5580 


ct ccrccracrcb 


c tccrcactct 


crtct ccat cc 


ccacatcaca 

^ J ^ — * *-a^ L» V__p ' J «a A 


tcacctcacc 

'— a ^ p P — J > a ^ * * a ^ — ^ *i *— > — 


ac tact aatc 

— * >— ~ ^ ^»»a L»a ~ — ^ L-*- La ^-p^ 


5640 


t cot ccrccrcft 


c crc c crcra crcrcr 


cracrctacQacr 


attaaaaaac 


cttatct eta 


ctt cct aaaa 

*p^a fc M a %^ ^ p^ Ny L* "H L*V 1 


5700 


1 1 tctacrtacr 


ctt tcrtcrtat 


crt at crt crt ot 


ttatatatta 

La. L- ^ l_- L- L- La ^ 


aaaaaacacc 


aa t caaat aa 


5760 


atcctcctgt 


ggtggttggt 


tgggcgcaat 


tcgtgcttgg 


tttatttget 


ggaattctag 


5820 


egggggaget 


ggcgttgtcg 


gtgctaattg 


ctgcggggga 


gctgctggaa 


ttcgtgcttc 


5880 


tgcttgggaa 


ttagaaggtt 


tgggttttta 


tgattcagag 


ggctgtagag 


ct cttgagat 


5940 


tggctgcgaa 


aattegggat 


ttgatcaact 


tagagagcat 


tatctttgga 


ttaggaggga 


6000 
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tttttcttaa 


tttttcttag 


ttttttttga 


gctatcaaga 


gttcatgcca 


tcttatttct 


6060 


ccctttgttc 


ttagccggaa 


ggatacacga 


atcagttttt 


tttttttaaa 


aaaaatattt 


6120 


atctcaattt 


tctgcaagca 


tgttcaattt 


ctaagtggaa 


atgctattta 


aaagaccagg 


6180 


cttattgatt 


ggtgctatac 


tttgattttc 


tttggaattg 


tagtagaagc 


atcagtttct 


6240 


tcatgctgtc 


ctaccaacct 


ctcttattat 


tagcaaagta 


aagttattaa 


atttgctaat 


6300 


tgttgatatg 


tcagtatttt 


gtacgaattg 


tgaaatagtt 


aattttcaat 


aactacacac 


6360 


catggttgtc 


ctgttgttgg 


actggaagca 


ataagggaat 


attccatttc 


tgtccattaa 


6420 


aacccacaaa 


gatgaccctg 


tgctcatctc 


taccattgcc 


atgcacctgt 


ttgtaggatt 


6480 


gcctaaccca 


gaagttggtg 


cttcgagata 


gccatggcca 


ccggagtggc 


accagcgccg 


6540 


ctcccacatg 


t cagggtccg 


tgatggtggc 


atcggcttca 


cgaggagcgt 


cgactttgct 

■ — • ■ — ^ 


6600 


aagatcttgt 


cggttcctgc 


tactctaagg 


gtgggctcat 


caagaggcag 


ggtgcttgtg 


6660 


gccaagagct 


caagtaccgg 


ttctgatacc 


atggagctcg 


agccatcttc 


agaaggaagc 


6720 


ccacttttag 


gtataactcg 


ccggctgttg 


ttcaccttgc 


atgtatattc 


gtgttagttg 


6780 


ttcttagtgc 


ttttaactga 


atgaacattt 


tttctgtaaa 


gaatctgaca 


gcatgtcttt 


6840 


tgcccttttg 


ttattcttta 


gttcccaggc 


aaaagtattg 


tgaatctata 


tatgagacaa 


6900 


ggaggagaaa 


aacccgcact 


gtg atg gtt. ggg aat gtg cca ctt ggc agt gat 
Met Val Gly Asn Val Pro Leu Gly Ser Asp 


6953 



15 10 

cat ccc att agg att cag act atg acc acc teg gat acc aag gat gtt 7001 
His Pro lie Arg lie Gin Thr Met Thr Thr Ser Asp Thr Lys Asp Val 

15 20 25 

get aaa acc gta gag gag gtacactcct atttgaagtt ctatgtttta 7049 
Ala Lys Thr Val Glu Glu 

30 

gtttttaatt etatgettga ataattgaat gctgggcatg cattaatcat gtgttctttt 7109 

agatgttcta tgtttcatga ctagtgaaat aacgaagtat agcactggtc cag gtt 7165 

Val 

atg agg ata gca gat aaa ggg get gat ttt gtt aga ata aca gtc cag 7213 
Met Arg lie Ala Asp Lys Gly Ala Asp Phe Val Arg lie Thr Val Gin 
35 40 45 

ggt aga aag gaa get gat gec tgc ttt gag att aag aac act ctt gtt 7261 
Gly Arg Lys Glu Ala Asp Ala Cys Phe Glu lie Lys Asn Thr Leu Val 
50 55 60 65 

cag aag aa gtaagagtca tcatttttcc agattcagtg agttttcatg 73 09 

Gin Lys Asn 

■ 

aatgaattct catcttgett ttgeatttea acag t tac aac ate ccc eta gtg 7362 

Tyr Asn lie Pro Leu Val 
70 
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get gat att cat ttt gec ccg aca gtt get tta aga gtg get gaa tgc 7410 
Ala Asp lie His Phe Ala Pro Thr Val Ala Leu Arg Val Ala Glu Cys 
75 80 85 90 

ttt gac aaa att cgt gtc aac cca ggg aat ttt g gtgagtgaaa 7454 
Phe Asp Lys lie Arg Val Asn Pro Gly Asn Phe 

95 100 

taatgatgtg tatcatttta gtgtcaatat . cttatcaact ctgtgcatat gctgagaact 7514 

ctacttgeag ct gat cgc cgt gee caa ttt gag cag ctt gaa tat act 7562 

Ala Asp Arg Arg Ala Gin Phe Glu Gin Leu Glu Tyr Thr 

105 110 

gaa gat gat tat caa aaa gag ctt gag cat ate gag aag gtt cca aat 7 610 

Glu Asp Asp Tyr Gin Lys Glu Leu Glu His He Glu Lys Val Pro Asn 
115 120 125 130 

ate tea etc ttt agt gtt aat tta gtcagtaaga atgtgcagta tgtttcctta 7664 
lie Ser Leu Phe Ser Val Asn Leu 

135 

ettgeatage cacttccata tcatttcag gtc ttc tec ccg ttg gtt gag aaa 7717 

Val Phe Ser Pro Leu Val Glu Lys 
140 145 

tgc aag cag tat gga aga gca atg cgt ata gga aca aat cat gga agt 7 765 

Cys Lys Gin Tyr Gly Arg Ala Met Arg He Gly Thr Asn His Gly Ser 

150 155 160 

ctg tct gac cgc ata atg agt tac tat ggt gat tct cca cgc gga atg 7813 
Leu Ser Asp Arg He Met Ser Tyr Tyr Gly Asp Ser Pro Arg Gly Met 
165 170 175 

gtattatttc ctttctgggg atttcattca aataactttt cgtttcatgg atgtcttcaa 7873 

ttaatgatcg ttttgataga tgaatgacat gttctacaaa taatttcag gtt gag tct 7931 

Val Glu Ser 
180 

get ttg gaa ttt gee agg ate tgt egg aag ctg gac ttc cat aac ttt 7979 
Ala Leu Glu Phe Ala Arg He Cys Arg Lys Leu Asp Phe His Asn Phe 

185 190 195 

* 

gtg ttt tea atg aaa gca agt aac cet gtt ate atg gtc caa gca tat 8027 

Val Phe Ser Met Lys Ala Ser Asn Pro Val He Met Val Gin Ala Tyr 
200 205 210 

cgc ttg ctt gta gca gaa atg tat aac eta ggg tgg gat tat cet ttg 8 075 

Arg Leu Leu Val Ala Glu Met Tyr Asn Leu Gly Trp Asp Tyr Pro Leu 
215 220 225 

cac ttg gga gtt aca gaa get gga gag ggt gaa gat ggg agg atg aag 8123 
His Leu Gly Val Thr Glu Ala Gly Glu Gly Glu Asp Gly <Arg Met Lys 
230 235 240 245 

tct gee att ggc att gga aca ctt ctg atg gtaattgeat ttttactttg 8173 
Ser Ala He Gly He Gly Thr Leu Leu Met 

250 255 

tgtattatat tgeatatate atatctttcc atetgeaaag ggtaagcatg ccttatgtct 8233 



i 
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tccttttgtt gtcttacag gat ggc ttg ggc gat aca ate cgt gtc tec etc 8285 

Asp Gly Leu Gly Asp Thr He Arg Val Ser Leu 

260 265 

acg gaa cca cct gaa gaa gag att gat cct tgc egg aga ttg gca aat 8333 
Thr Glu Pro Pro Glu Glu Glu He Asp Pro Cys Arg Arg Leu Ala Asn 

270 275 280 

ctt ggc aca cat gec gca gac ctt caa ata gga gtg gtaacgattt 8379 
Leu Gly Thr His Ala Ala Asp Leu Gin He Gly Val 
285 290 

attacctttc tctagtttta cacttttctc ttgtttagct gccaatgcca cacattaatt 8439 

ttgactattt ttagtagtgt tttgttctat ttgttctttt aagaatttct atttatatac 8499 

attatatgtt ctcag get cct ttt gaa gaa aag cac agg cgc tat ttt gat 8550 

Ala Pro Phe Glu Glu Lys His Arg Arg Tyr Phe Asp 
295 300 305 

ttc cag cgt aga agt ggt cag ttg cct tta caa aag gag gttagttcaa 8599 
Phe Gin Arg Arg Ser Gly Gin Leu Pro Leu Gin Lys Glu 

310 315 



aataactcct 


atagtccata 


gttatcataa 


aaacaatagt 


gctagatttc 


ttattagttg 


8659 


cacttatgac 


agggtgagga 


agtagactac 


agaggggtct 


tgcaccgtga 


tggctctgtt 


8719 


ttgatgtcag 


tttccttgga 


tcagttgaag 


gtaactcaca 


tatttgttac 


ccttttgtgc 


8779 


aatgtgttga 


tcttgtgtaa 


ctttaccaaa 


atatatttca 


agacaatagt 


ctattttgta 


8839 


atatacaatt 


ctacaacatg 


atattttcag 


tagccatgtt 


ccatgcattc 


tatgeatagt 


8899 


tcatagtaca 


tagtgagaat 


agcaatagca 


aaaagaaggc 


attgattttt 


ttctatctga 


8959 


atcaaatcaa 


ttgatgeatt 


ttgtaatgat 


ggaaggctct 


cttatttttc 


ag get cct 
Ala Pro 
320 


9017 



gag etc ctt tat agg tct ctt get gca aag ctt gtg gtt ggc atg cct 9065 
Glu Leu Leu Tyr Arg Ser Leu Ala Ala Lys Leu Val Val Gly Met Pro 

325 330 335 

ttc aag gtctgatcct tatagctgta cattctagca aacaactaaa ctttattggt 9121 
Phe Lys 

acttcagtct aaactgatgt taatttttct atgaatatca g gat ctg gca act gta 9177 

Asp Leu Ala Thr Val 
340 

gat tct att ctt ttg aag gag etc cca cct gta gaa gat get caa get 9225 
Asp Ser He Leu Leu Lys Glu Leu Pro Pro Val Glu Asp Ala Gin Ala 
345 350 355 360 

gtgagttcct tcaacattat ttgttctttt cacaaatcac aagcttatat taacattcta 92 85 

ttcctttaaa atttttgtgt tgaaatctgt aaaatggtac ag agg ctt gca etc 9339 

Arg Leu Ala Leu 

aaa aga tta gtt gac ate age atg ggt gtg ttg act ccc tta tea gag 9387 
Lys Arg Leu Val Asp He Ser Met Gly Val Leu Thr Pro Leu Ser Glu 
365 370 375 380 
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caa ctg aca aag cca etc cca cat gca att get ctt gtc aat gtg gat 9435 
Gin Leu Thr Lys Pro Leu Pro His Ala lie Ala Leu Val Asn Val Asp 

385 390 395 

gaa ctg tea age ggt gca cac aaa ctt ttg cca gaa g gtagacat'tt 9482 
Glu Leu Ser Ser Gly Ala His Lys Leu Leu Pro Glu 

400 405 

gaatttgata atgatctttg ttgttttgtg aattgtgttt atgtcatttt ctgtatttta 9542 

acattttget tagtctgttt tattgatgaa tctttttttt atgtag gc act aga 9595 

Gly Thr Arg 
410 

ttg get gtc acc ctt cgt gga gat gaa tea tat gaa cag eta gat ctt 9644 
Leu Ala Val Thr Leu Arg Gly Asp Glu Ser Tyr Glu Gin Leu Asp Leu 

415 420 425 

ctt aag ggt gtt gat gat ata aca atg tta ctg cac agt gtt cct tat 9692 
Leu Lys Gly Val Asp Asp lie Thr Met Leu Leu His Ser Val Pro Tyr 
430 435 440 

ggt gaa gag aag act ggc aga gta cac get get agg ag gtaagtgaac 9740 
Gly Glu Glu Lys Thr Gly Arg Val His Ala Ala Arg Arg 
445 450 455 

acagtaggee agttaatacc actccctcca ttattaccat' ttgttgggat gaaccgatag 9800 

tcaattctaa gttacacatt aagcatgaaa aatgaaaatg gatttgactc tgcagaaaac 9860 

tgaeatacag accaatgttt ccacctggtt ttccattgtt ctgtacttct ctttacctaa 9920 

aattttattt tttttaataa tgttttgcag g tta ttt gag tac tta gaa acc 9972 

Leu Phe Glu Tyr Leu Glu Thr 

460 

aac ggt ttg aac ttc cct gta ate cat cac ata gaa ttc ccc aaa age 10 020 
Asn Gly Leu Asn Phe Pro Val lie His His lie Glu Phe Pro Lys Ser 
465 470 475 

gtg aac ag gtactatgaa gtgettatta agagatgeat tgaccgccca 10068 

Val Asn Arg 

480 

tccttaeccc ttgaaattac tgtaccttta ttctcttgtg cttatttgag ttaaattata 10128 

tgcag a gat gac ctt gtt att ggt get ggg gca aat gtt ggt get ctt 10176 
Asp Asp Leu Val He Gly Ala Gly Ala Asn Val Gly Ala Leu 
485 490 495 

eta gtt gat ggt ctt ggt gat ggt gta ctt ctt gaa get get gac cag 10224 
Leu Val Asp Gly Leu Gly Asp Gly Val Leu Leu Glu Ala Ala Asp Gin 

500 505 510 

gaa ttt gag ttt ttg agg gac aca tec ttc aac ttg tta cag ggc tgc 10272 
Glu Phe Glu Phe Leu Arg Asp Thr Ser Phe Asn Leu Leu Gin Gly Cys 
515 520 525 

agg atg cgc aac aca aaa acg gtaagctgat gaattcttct ctgttagacb 10323 
Arg Met Arg Asn Thr Lys Thr 
530 535 

gtagatccca tgaacaacgt caacctttaa ctegtgagat atcatgaaga agtgcaaaat 103 83 
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tgcactttta acagtaaatg aaccttatag cctaccgaag aggataaata actttaggca 10443 

attctctctt gtgaagcaga acattctttt ggcgatttct gaccgttaat taatgctgca 10503 

ggaatatgtc tcttgtcctt cttgtgggcg gacactcttt gacctccaag aagtcagtgc 10563 

tcagattaga gagaagacct ctcatctgcc aggcgtctct gtaaactctc ttacagacct 10623 

tctgcctccc ttgttttcaa tcgcatatta gctagcctga tggctaatca tgtctacatt 10683 

tgcctggcag att get ate atg ggt tgc att gtc aat ggg cca ggg gag 10732 

lie Ala lie Met Gly Cys lie Val Asn Gly Pro Gly Glu 

540 545 

a tg gec gat get gat ttc gga tac gtt gga ggt get cct ggg aag ate 107 80 
Met Ala Asp Ala Asp Phe Gly Tyr Val Gly Gly Ala Pro Gly Lys lie 
550 555 560 

gac ctt tat gtt ggc aag gtaacctttt cctatacttg tggaagttga 10828 
Asp Leu Tyr Val Gly Lys 
565 570 

atcatatcaa atggaataat ggaaatcacg gtatatcgtt gaacatagct gcaagtcaat 10888 

atttgtacat gatcatgeaa acacaatcaa cagtagggat gttaactgea tggcatatat 10948 

atgctctttg agctgaaaca aaaacttaga getgecattt tcettccatt aacacaagtt 11008 

ctacttgttt tgggtgcag acc gtc gtg caa egg ggc att gca atg gag ggg 11060 

Thr Val Val Gin Arg Gly lie Ala Met Glu Gly 

575 580 

gec act gac gec ttg att cag tta ate aag gac cat ggc cgt tgg gtg 11108 
Ala Thr Asp Ala Leu lie Gin Leu lie Lys Asp His Gly Arg Trp Val 

585 590 595 

gat cct cct gtt gag gag tag geegtagcat gtagttcata tatgtactcc 11159 
Asp Pro Pro Val Glu Glu 
600 



tccataaaca 


atgttgtagc 


tgaggcacat 


tgtattgtat 


ecaeggagta 


cataaataca 


11219 


cgttctgtac 


atcagtttag 


aaataaagta 


ggaatagggg 


tggctgeaac 


tttgtaacac 


11279 


cctcgtgaag 


categgcaaa 


tccaaattag 


aagcgtcctg 


aaatcagtga 


aaaagaattg 


11339 


atactgetat 


tttttgtacc 


aattgaaaaa 


aaaaaggaat 


acatgatatg 


actaaatcat 


11399 


gggttacatc 


ttcgtcaaaa 


aatgtcacag 


cttacattat 


tt tcactact 


tgeaaatace 


11459 


agacgatcta 


ctggtgcggg 


aacttgaegg 


gtgeaggaga 


cgcgaagccc 


ttgtggtaga 


1» 3* ^3 1 


gaagctegge 


catgaegctg 


tacgcgcgct 


gagtcaggtg 


gacgccacgc 


catcceagct 


11579 


gatctgctcc 


atctcgaagt 


tgtacttccc 


gccgcagcac 


gecttggtea 


gcgccacgcc 


.11639 


gtcgaacccc 


gtgtcgcgcg 


cgccctccag 


catccgcacg 


tacgcgccgg 


agtagtegge 


11699 


gtacgegate 


gtggcctccg 


gttatgaccg 


cctcagctcc 


cggatcccct 


getgeagcag 


11759 


cacgttgtgc 


atetgegega 


acaggttgag 


acccacgagg 


cacccgttcc 


cgtcgtacgc 


11819 


cgcgcgctcc 


gtctcgtcca 


ccgccgccag 


gtagctegge 


gcgcaaccca 


gcgggaagtt 


11879 
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gcccgggatc 


accacccgcg 


tcgcgctcat 


ctcgagcacc 


tccctcgccg 


cgctcaccac 


11939 


gcgaccgcac 


cacctctggt 


acgagcacca 


ccgactccac 


cacgccggtc 


at catgcgcc 


11999 


cgacgtccgc 


gcggcgctac 


gacctcctgt 


tcgcggcctg 


ttcgcggcga 


tctctcccac 


12059 


catcaccagc 


aagctcgcgc 


cagcttctct 


tgctcgagaa 


ttttcagaat 


atgccaccga 


12119 


atatgcaccg 


ttttcaggat 


agaccactca 


attcgcacta 


ctttcataat 


atggcatttg 


12179 


gacgcgatat 


tttcttcgtt 


ccgtgacact 


ctcatccttc 


caccgtcagc 


gccagtaatt 


12239 


ccgttcgcac 


accaacagct 


ctctctgagc 


gtccagctcc 


agtgggggag 


ttgttggtgc 


12299 


gcggcgcggt 


aacaccgatc 


ctcgcgaggg 


ccgccgcgtc 


gagggcggtg 


gcgccggtga 


12359 


cggcgaaagt 


tgacaccgta 


ggagaagtcg 


gcgcctttgt 


cgatgtacgg 


gttgagcagc 


12419 


ggcagcccta 


ggtcgttggc 


gaggtagtcg 


atcatgaggt 


acccgtcgtc 


ggagcactgc 


12479 


cccgtggcgc 


tgccgatggc 


cgcgccgtac 


gtagggaggc 


gccacggtgt 


gctccatcaa 


12539 


ggcgaggaag 


ttgccggtgt 


ccgagatgga 


gtccccgaag 


ttgtagatgt 


ccgtgatgcc 


12599 


gtccaccacc 


gcccccttcg 


ccgccgatga 


caacgacgac 


gacgcggcct 


tccccggagc 


12659 


cggccttgcc 


tggcaagtgc 


cgacgaggag 


cagcgccaag 


aacgcgacga 


ggattggatg 


12719 


aaccggccta 


ctcgccatgg 


cgctcggtgc 


aagtgcaagt 


gggtgcgacg 


cagcagttgt 


12779 


tgtggcatgg 


cgcgcgcgcg 


gtgtggaatt 


cgattggaaa 


cgatttaagc 


tgagacatag 


12839 


tccaactccg 


aaacccaaat 


taaccataca 


tacagtgata 


caggtgaatc 


gacgagatga 


12899 


tcatgcacta 


cttaaaaaaa 


accgtcaaaa 


cacatttttg 


taggcggtca 


aatactctat 


12959 


gtacttaaag 


gcctgcgaaa 


ataacgcccc 


aaaagtcgtt 


tcttagtagt 


gatgcatacg 


13019 


caattgctgc 


aataacttaa 


aaagggtgat 


ttttattgca 


tcaacgtaac 


acgtacactg 


13079 


cattagtcct 


cctacattga 


aagcacaaat 


taaaccagta 


tggttgcaac 


ttgagacaca 


13139 


caaaggtgat 


cgatcgagaa 


ggttagctat 


aaacagcacc 


ccaaatggca 


cgaattaata 


13199 


atgtagttct 


ttctgcatgc 


tgaccaaaat 


ttcattttct 


ttttctctcc 


cctcgtcatt 


13259 


aaaaaaaagg 


tttaaagaca 


gaattacaag 


ctaattaatc 


atcagtggat 


cgagaattaa 


13319 


ttaagggatc 


acaatggctg 


caccccgcta 


tttcggagta 


gctagctcca 


tgcactcact 


13379 


catgcatgca 


ggcatgcata 


tacatgtccc 


ttgccatgtc 


ctatctaaca 


atttacacat 


13439 


ttcgacaaaa 


tgctcacggt 


cgatttggat 


tgtgtcactg 


acattaattg 


gttcatgcat 


13499 


ccacgcatgc 


gttactctca 


aggaaatatg 


aaagtatcat 


ccgtaatcag 


ggttccaaac 


13559 


taaggataga 


t accfctfc can 


nnimnnnnnii 

* JL X X X » « -*-*p X *Jk X- XX * 


nnnnnniiTinii 


imimnriTiTi a a 


acctactaca 


13619 

—J ^ 


gcaagtgcac 


ttctcctgct 


catgcttcag 


agcctgcacg 


cagaaagacg 


acacaaaatt 


13679 


caaaagttta 


tatcgcttct 


gttttggagc 


ctcggctaaa 


aaatgaaaat 


atgaacaacc 


13739 


aaaaaaggca 


acacgtacga 


gttctaacca 


agtatataac 


cattahaatg 


gcaaatgtga 


13799 
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tctatacttt 


tgtagacgaa 


gacaattaat 


gatagtacca 


gtgaatatgc 


tagctatata 


13859 


cttttatcaa 


ctacttatcc 


gatcaatatg 


cttcagcatt 


acaaactagt 


tcttatatat 


13919 


atatttcttc 


tatcttattt 


catctctaaa 


atacaaagtt 


tatagtgtaa 


agagatcccc 


13979 


agggatgaat 


atatcttcta 


acacacctcg 


tagttaattt 


gtt ccaaaca 


atactagcat 


14039 


gcatataatt 


tgtagttatt 


tgtagcaaag 


cacggctatt 


tcgctaacaa 


atctaaatag 


14099 


aaaatatgtt 


atctctcagc 


cttgagaggt 


gtattaatta 


ccagcccata 


catcacttga 


14159 


gagggaaaag 


atttaaataa 


gacaaattga 


ttagaacaaa 


agggaatgat 


agacaatgtc 


14219 


ggtttttttt 


egtttcttcc 


tttccttcgc 

* 


ataggctcgt 


ctagctggtt 


gcgttatgta 


14279 


acaaaacctc 


ttttcctttt 


aatatattga 

— * 


tgggcgcgcc 


ttttgcgcat 


tcacgaaaaa 


14339 


aaatgtaaat 


gtgaattttc 


i 

aatcttatcc 


cctacttgcg 


ggattagtcc 

— ' — ■ - — • 


ttgtgaagaa 


14399 


atcctcaaat 


atgcgtacct 


gcagctggct 


ctgcagaccc 


ttgatgtgct 


caactgcaag 


14459 


gt ccaacatg 


tccgctgtgc 


ttgtttgctg 


ttgcaacacg 


aacataatta 


attactcaat 


14519 


tggttgcatt 


attcatgcgc 


aaaaaatgtt 


accgctaatt 


aatattagct 


agaactagat 


14579 


gagagaacgt 


acgacccctt 


tcatctatat 


acaataatca 


tgaatttgtt 


gagaaagcat 


14639 


gtttggtatg 


gtgttggagt 


tgtggctgtc 


atgcaccaaa 


gctctaatct 


cagtgcctat 


14699 


agaatttaac 


tacacaaaca 


tggatacgct 


ttttctagaa 


attctattag 


gttatgattt 


14759 


tgcgcttggt 


gtccatgaat 


ttgttgagca 


tgtgttaagg 


gacacttcac 


agtgcacact 


14819 


catgggtgaa 


tgcgtgtgca 


tttgccatgt 


ctattattaa 


ggcgagaaac 


atgaatctgt 


14879 


gtgctaatgg 


cacaagaaat 


gtggaaagtt 

■ J — / tea' ' 


tttttttaaa 


agaaaatact 


tagctaggga 


14939 


tgttcctttc 


ttcctcaaat 


atcatgtaaa 


tataggtatg 


aacattatgc 


aaagttcaaa 


14999 


tcgtaatggc 

- — ' ■ — • ■ — f 


caccttgtcc 


atgttgggca 


ccagctcctg 


cagcttcctg 


agcttctcgc 


15059 


taattctcgt 


cctccgttcc 


tacggacgcg 


catcgatcac 


accgacgtac 


atgctcatgt 


15119 


gtcaagatct 


gaagagaaag 


caaaagcaaa 


tatagaggcg 


ttttgatcat 


gatattgcgt 


15179 


acgtaccctc 


tccgcgatgc 


tcctggggtg 


cgtcgcgcag 


ccgcgcttgg 


cccgcacttt 


15239 


gaacggcacc 


tggtcatgct 


gcagctgcag 


gtacctgtcc 


atgccggcca 


tctccagcgc 


15299 


cgacgtgctc 


gccatgccgc 


cgaactgccc 


ccatttccaa 


cacgcccaag 


aaatcagaac 


15359 


acatcgcgat 


atatatatat 


atatatatat 


atatatatat 


atatatatat 


atatatatat 


15419 


cacaaacaca 


gcaaagctag 


ctactacttc 


ttcctctgtt 


ttacattaat 


tattataagt 


15479 


tgttttgagt 


tttgaataga 


ttcatacatg 


tataaatgta 


tgtgtttcat 


acatgtgtcc 


15539 


aaattcttat 


gaatgttagt 


aaatataaac 


aagggatgaa 


gagatcaaga 


agccttgtag 


15599 


tgtacaatga 


ttcaatgaag 


gtagccctag 


caatcaaatt 


tgccgagcaa 


tctttacctg 


15659 


ggactcgtac 


ccgccgaggg 


tggagatgat 


gtccctggac 


tcctcccacg 


gcccgacgat 


15719 
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qqaqaacccg 


ccgccqccqc 


tqctqccqcc 


qqcqqaqaag 

3J 3J 3) 3 3 3 


gtgcqqgqca 

»3 3 3 3 3 3 


cqqaqqcctc 

3 3 3 3 ^ 


15779 


qqcqccqqcq 


cggtcgggqa 

— >3J w V 3J 3) 3J3J 


aqqcqccqtc 


ctccqcgatg 


tqcqaqaqqt 


qcqqcqqccc 


15839 


qqccqtqaag 


ctcagctggg 

31 31 3 3 


acttcatctt 


cct cccgccg 


ctgctgctgc 


Cqctqccqct 


15899 


qccqqccatq 


qaaqqqtqqt 


qqtqqqct tc 

3I31 3 3 3> 


qqctccqctq 


cctcccccgc 


ettttgagee 


15959 


tqqaaaqcct 


gcttagttfca 


ttgccaagta 


qcaaqcacqq 

3 3 3 3 


aaattaacta 


atgategcta 


16019 


attagttaaa 


ttaactgtgt 


qtqtqaqaqa 


aagagctact 


gttacccaaa 


cqctaqttqa 

■w ^) "W W-» Wk ^-J W> W WW 


16079 


aaactgccaa 


qtqtqacaaq 


taaacaatag 


ttt aeggtat 


tagcataccg 


ttagagctag 


16139 


ct ctatacrcrt 

^ — * w« *— • ww i_j> ww wj w# 


acaccrtcrt t cr 


acrcaa t aacrfc 

Wk Wj Wfc Wb »— ' Wi» w> 


ttaacct acra 


t cr t cr a t crcr cr a 


tcrt t caaac t 

W wl W-* WW Wi W- w» 


16199 


tcrcttctcca 

V* wj L-* w w» w w*. 


acrcrt tcraatcr 


craafcaatcrtcr 


tat tt crat tc 

w> wt w w^ wj \_* W Vw* 


tacaatattt 

w wv Vw Ci wv w- Wi w w- V_- 


tt ctcrtacrta 

w w> >w w 1 w] w wt^w) W WW 


16259 


exert oca cat a 


attaaaotta 

ww w- »— - wt ww wj Wl \_* wl* 


crat t taat t c 

^— i x — \ w w w x — j Li w w *w 


tcatcrttcaa 

L * L* wfc w ^— i L_- k — ■ \_# wL ww 


atcrtcrtat 1 1 

U w ^— J w Nw ] w> *— i w w w 


a a c t cr c a cr cr t 

wt wL w» w wL < — ■ 


16319 


qtaatqttat 


atatqeataq 


tqqfc tctata 


aatat ttt ca 

W« W^ w^ w^ w^w 


taat t aaaca 


ct accaaatt 


16379 


tctatttqaa 


atccatqtac 


aaat taaac t 


tqact aat ca 


ccqqt tat ta 


taqtt aaaca 


16439 


taact taaac 


cacaacaat t 


accattcatc 


aacactatqc 

^^W ^1 W^V ^ ^ W^ W^W 


act ac taact 

WW ^ ^ W^ W^W W^W WyW ^1 ■* Vr' 


aattaaaaaa 


16499 


aatt acaagc 


tagcactacg 


aaattaaaag 


tqqcccqqcc 

^33^^ w 3)3 w ^ 


qaqt tqcccc 


agcacaaaat 


16559 


agcacgatag 


a tacaggata 


tacttcct cc 


gtt tctaaat 


at tt tacacc 


gt taactttt 


16619 


taqcacatqt 


ttqaccatt c 


at cttatt ca 


aaaat ttttg 


tgaaatatat 


aaaactatat 


16679 


gtatacataa 


aagtatattt 


aacaatgaat 


caaatgatag 


qaaaaaaata 


atacttattt 

' ^w w^ W^ V»*«p w w^ w^ 


16739 


aaaattt fctcj 


aataagacga 


aeggtcaaac 


atqtttaaaa 


aaqtcaacqq 


cat cqaat at 


16799 


ttagaaacgg 


aqqqaqtata 


tqaaaqqaat 


attct cqtqa 


ct aqaaccat 


atqtt ccaqa 


16859 


aaqttqtact 


ccatccattt 


taaaatqtaa 


qqt ct at t t t 


qaqt qqt cac 

WJ Ww;wj W Wj w >W WW 


aacrtat taacr 

W^W V_rW — 1 1_r w W^ lu^W WarW *J 


16919 


aatatqaaac 

W*» lm*%* ^* WW W^ Wj WW- Wi ^— * 


t tacaqaaaq 


atoaqt tcaa 


acqaccacct 


t aat t acraaa 

* W^V WW W^ W WW WW W^ ^- 


era crtacrt acra 


16979 


tcqttacftqa 


qacqaatat t 


at at at at qa 


= aqaqacaaa 


aacaatt aaa 


at taqtqt t t 


17039 


qcatttqcqt 


t catctttac 


t agct at t ac 


tagt t act ta 


taagcacatc 


qt caaacatq 


17099 


t acttacgtg 


ttgeaactta 


attt ctactc 


cct ccaat t c 


agtat tggt c 


qt t ttqqatq 


17159 


aaaataatat 


caaagttagc 


aatccqqccq 


taaccat ttt 


t t caaacctt 


gt atgcccaa 


17219 


taqttacat c 


qctattcaaa 


tcaaaqqt tt 


caaattt tqq 


at tactattq 


qqtcccaata 


17279 


qaaqcccaaa 


aaqtattfcqa 


at tt t t taac 


t taqqccccq 


tttagtt ccc 


taaatt tttt 


17339 


ttcaaaaaac 


atcacatcga 


atttqtqaac 


acatqeatqa 


agcattaaat 


ataqataaqa 


17399 


gataatccct 


catatgccac 


taaaaattga 


tctgatccct 


tatatgecac 


taaaaattgg 


17459 


ctcctccctt 


atatgecatt 


ggtctaaatt 


tgcgtaccct 

■ 


ctcatgtcac 


tacegtcagt 


17519 


tgaccgtgtg 


ttgaccgtta 


actctcaagt 


aaaaaagaca 


tattgccctc 


tctgagttgt 


17579 


taggcatgcc 


ctatactcag 


aagggtaaat 


aegtcttttt 


tccttaagaa 


ttaaeggtea 


17639 
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acacatgtca 


actgacggga 


gtggcgtgag 


agagtgtgca 


aatttggatc 


aatggcatat 


17699 


aagggaagaa 


ctaattgtca 


atggcatata 


agggattaga 


ccaactttcg 


gtggtatata 


17759 


agggattctc 


tctataaata 


aatgaaaaat 


ctaattgcac 


agttagggag 


gaaatcgcga 


17819 


gacgaatctt 


ttgagcctaa 


ttaatccatg 


attagccata 


agtgctacag 


taacccacat 


17879 


gtgctaatga 


cggattaatt 


aggcttaaaa 


gattcgtctc 


gcagtttcca 


tgcaagttat 


17939 


gaaattattt 


ttttcattcg 


tatctgaaaa 


acccttccga 


catccggtca 


aacatccgat 


17999 


atgacaccca 


aaatgtttct 


tttcgcaaac 


taaacaggcc 


cttagcaaaa 


tggttggtta 


18059 


tcaactttta 


aaatatgttg 


acagtgtctg 


tgacgactt c 


atgacggtcc 


tctttaaagg 


18119 


tgcttatata 


gtgatagggt 

' J ■ 9 * ■ * 


gtgcgtgtat 


gttcagagcg 


ttgagtatgc 


atgtgtatat 


18179 


atgcatgttt 


gtgtctgtac 


tgtgttaaaa 


aagaaaatcc 

— * 


caagatctag 


cctaaaattt 


18239 


tcattaaaaa 


cattgaaatt 
— ^ 


ttggccccac 


gattttttta 


ttccacaatg 


taaatttcta 


18299 


gt caaattgc 


tgcgaatgac 


gcgaaaatta 


ttttctgacc 


agtgaactga 


catgcacaca 


18359 


ttacactata 


tttattttat 


atttattttg 


aacgtaccta 


cgactacttc 


caggggatcg 


18419 


atcttattct 


cctcaaatta 


ataagaacaa 


gtactctctc 


cattt caaaa 


tacaacaacc 


18479 


taagaatatg 


gataattttc 


ttcattgaat 


cggatggttt 


cttcggtttt 


tttgtactac 


18539 


gatgcgaaca 


gatggtatat 


tgaagcctac 


cggacacgct 


agcacgtgca 


tgccgcgtgc 


18599 


cggcccgtgc 


atatgagcaa 


gcctcgcacg 


ctgacataga 


cgcagccaag 


agagaaagca 


18659 


aacgccaaat 


caagaagccg 


agcaatcacg 

— * ■ — * 


catgccatct 


caacgcaccg 


taggtcacta 


18719 


tctttagcga 


ggcaagaccg 


tgacgtcacc 


gtcaggccat 


cagcagagga 


gctgaacctg 


18779 


gacaaaccgg 


ggggcccacc 


ccgcaggcca 


agttgcggcg 

— ' ■ — ' — ' - — ; — > 


acacacacgt 


ggtccccgcc 


18839 


ttacattaag 

— * 


gcaagtggcg 


ccctaattaa 


tccattgatc 


aaaaattaat 


taatccacaa 


18899 


attaatcaaa 


tgccctcatc 

— * 


tttttctttt 


tgccttggct 


agggttcgag 


gcactaagat 


18959 


ccactggtaa 


tttaattgtg 


cttgctgtct 

■ — > ■ — ' 


tgatactaat 


taattgatca 


tatatgcgca 


19019 


agttggtcta 


tctagagcag 


aatctagagt 


gcaactggct 


gccgcattga 


aagaaatgct 


19079 


gctacatggg 


ctccactgaa 


agacatttga 


ctctt ttaaa 


ctttact cga 


ggctattcct 


19139 


acctcgatca 


aagtataatt 


actaaattta 


gtactggtgt 


agtacttata 


tgtggatttc 


19199 


gacatttcta 


ctggtactat 


ttttatcctt 


accaattgtt 


gtatacaggt 


tgctcggtca 


19259 


aaaggccatt 


ttagatgttg 


gtatatatgt 


agtgtgaaaa 


ttaattataa 


cataactcta 


19319 


tgttcatatt 


gatctgcatt 


tcaaaaagat 


attgacacac 


ttattcctaa 


tttttgaata 


19379 


aatgatattt 


tgaagttttc 


attaaagggt 


tattatctct 


gtatgctcta 


aaacgttgaa 


19439 


tatttgtgac 


gcagaattaa 


tttaatactc 


atggaataaa 


taatgatggt 


gcataatttt 


19499 


gcaatgattt 


tcatcaaatg 


aggtgcatat 


aggtatcctt 


tatatgaaat 


gagaatactt 


19559 
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gccaaaaaca 


tttttaaaaa 


gagcttgttt 


tagctagcta 


ggttggtgaa 


tggtgatact 


19619 


aattaatcaa 


atgtacatat 


ttgtgcaaat 


cctggaagat 


gaatgcatgg 


ttttctagtc 


19679 


ttattatgaa 


caaattaaat 


tagaaaaaaa 


aacatctatc 


tctttgctct 


ctccactata 


19739 


gcttcaaatt 


gttttttttc 


cccatgtcta 


ctattgtagt 


gaagaatgga 


ttgt catgcg 


19799 


caatgacttt 


gcaactgaaa 


ataatggatc 


aaatgagaga 


gagggacacc 


aggtgcaagt 


19859 


qgcaaaaaaa 


ctaagccatt 


tatagcaagt 


tgcaatagaa 


aataagacaa 


tctagagaca 


19919 


ctcgattata 


aaaagcgtac 


gtaaaaagaa 


taaaagcggt 


gtattcaaaa 


ccctagaccc 


19979 


cacatttcac 


tatcgatgat 


accctacttg 


agaaaacccg 


cctcctgtgt 


agcccatagt 


20039 


tt tccatcat 


cctfcctt aca 


ccrcccracrcca 


aatttgtgca 


ctcctcgtaa 


taacatatgc 


20099 


cttaaaaact 


tgaactcata 


ttacattatc 


acgaaaacaa 

— * 


t taagccgca 


taatctcatg 


20159 


gatataacat 


ctcatggtgg 


atccttaatt 


aacagcttat 


atatatatat 


atatatatat 


20219 


atatatatat 


atatatatat 


atatatatat 


atatatatat 


tgaccctaac 


tgtggcaaac 


20279 


atgcattatt 


atcacacaaa 


agttactaac 


cacatatagg 

— ^ — ' 


agcctatggc 

— * - — * • — ■> 


taatggctct 


20339 


gagtagaaaa 


atgggcacag 


aggatctcca 


tgatactatt 


tatggcaact 


cacgtagcaa 


20399 


aaagccgcag 


actaacacat 


ccatggatat 


ccacaacgca 


tactgatagt 


agtctgatat 


20459 


acacactagc 


tcctcccatg 

— * 


acggccttag 


cgaaaaccac 


tttttaaccc 


aaaaaaaaaa 


20519 


ccagttagga 


ccggtgaaaa 


gtcgcacgcg 


atgatcgatt 


cacgcgcgcg 


ccgcagaagc 


20579 


aacttgcaaa 


agggatcgag 


cttagctaga 

— ' — > 


tagcgcgagc 


tcatcagcat 


ttcgtcgtcg 


20639 


ccgagcgagc 


tagtggcttt 


ggcagttagt 


agtgatggga 


gttgcataga 


agttaagaac 


20699 


caggtagaca 
- — * ^ — * 


gagatcgatc 


gattgatcaa 


acccgtttgg 


tttcggataa 

— , — i 


gtatgggaag 


20759 


aatctgaaac 


agtgtggagg 


aaacactgag 


agagaaagaa 


caccattaac 


aataatatcg 


20819 


atggaattcg 


ttttttttgg 


tggttgttgc 


tagaagccta 


gaacagcaat 


tcatgtgatc 


20879 


gatcgatact 


tcgatcgtgt 


gcgtgtgtga 


cgagaaagag 


atggggcatg 


tgaaggcaaa 


20939 


gacgaggttg 


acatttgcac 


agctagccgt 


tctctcctga 


cagaattaag 


ctagaaattg 


20999 


aagatccgtg 


actctgagta 


gtcctaacca 


attagctata 


cgcctataca 


cgatgggcta 


21059 


gctatgcacg 


cacgcgacgc 


caaattgaac 


ac 9gatgaac 


aaataaaatc 


gaacaatggg 


21119 


ttggctagcg 


caatcgatcg 


atcgatctta 


ccgttgctgg 


ccatgaggtt 


ggagaagaat 


21179 


ccggccggcg 


agctgctgtg 


ccgcgccagc 


aagtccacgc 


tcccgtcctg 


cagctgatgc 


21239 


ccatgccct c 


cgccccctcc 


gccgccgccg 


ccgccgccgc 


cgtgcgggcc 


cagcgagatg 


21299 


tcccccccac 


cgaaccgcag 


ccccgccgcc 


tccgcctccc 


ttggctgcgg 


cgtcgtcgac 


21359 


gacgacgacg 


gctccacccc 


tcctcctcct 


cctcccccca 


ccggcagaaa 


cctcctcatc 


21419 


atacccctcc 


ttggatcgat 


cgatcaactc 


caccccccgc 


gaccgagacg 


cggcctctcg 


21479 
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tcgatcgatc 
atttcggcgt 
agaagagggg 
aagtgggagg 
gggcgtgtgg 
gcggagtgga 
cgcgcgcccc 
ctagattttt 
aaaggatttt 
cggggactag 
cagtctttta 
actcgtagga 
caatttggtg 
ctttttctac 
ttgaaattgt 
cggaaaaaat 
ttagggagcg 
tatgtctaac 
aactcacggt 
aggtcaaaat 
accatatgtt 
accatgtttt 
caattccact 
agcaaacttg 
ggagtaatat 
cgcagtaatt 
atgattcggt 
agtcaaacgg 
caaaattgga 
taccaatgta 
gctttgctag 
attttttaaa 



tgcagctcgc 
gaaaattaac 
aagggaaggg 

agagggtggt 

cgccgtgggg 

gtgggcttct 

gccccggaac 
gtgttggata 
tttggtgtgg 
aggatgaact 
tatttacctt 

ggggaggtac 

gtcgagctga 
gacgctaaca 
tcgatataac 
gtttgaatt t 
gttttttaac 
agtagtataa 
gttccccgca 
t taggaatga 
ttgctcacta 
tgtggtgacg 
ttggtctatt 
aaaacctagt 
cataagagga 
cattcgatag 
cgcacggtgg 
acttcggttt 
gtcgcctgat 
cagtgaactg 
ctagttaggg 
atatgagttt 



gcaggcgcag 
aaaacgacgg 
aagggggagg 

gggattttaa 

accagcggac 

gcactgcgca 

cggcaggcat 

tatgatgctg 

cttagatttt 

cgataatcaa 

ttgtgatatg 

gtagttaacg 

cctatgttcg 

gactgattat 

tggtttaagt 

cacttgtttt 

ctcggcacat 

ttttatcaca 

aaaaataaaa 

aatgaataat 

gatatgacaa 

cttagttaac 

ttgtctattt 

acatctaaac 

agacaacaaa 

attattagta 

ttgtgaagtc 

tggtcaggta 

catgtgtgcg 

cgttttgttt 

tgattcctat 

ttttattgtg 



gtaggcaggc 
gggcggccta 
tgaggtggtg 
agggaagcga 
cggccgggcc 
gcagcagcag 
ctctctcggc 
atcgaggaaa 
tggatgcttt 
tggtggtggc 
gaggaaacaa 
gcaaagatcg 
cccatcctct 
cacagtcatt 
tcaaacaaat 
caaccgttat 
ccgtaaactc 
atgatttgtc 
aataaactca 
caattgggtg 
ggaaaaaccg 
tcatacatca 
gaaatcatgt 
ctagctccac 
aaataggata 
attcattcga 
ggagccatga 
aagttgtgtt 
gtggtgtgat 
taaggctgtt 
tttfctgtcag 
aataatgagg 



ggcgcgtggt 

tactatagct 

gaggtggtgg 

ggccccgtga 

cgggcaagtg 

tagcaagccg 

ttttcgctgc 

gggaaggaag 

ctttcctctg 

ggcaaatgtt 

gctggtttgt 

atcatgcaag 

cgatactttt 

ggatagatcg 

ccaagctaaa 

tgctgttagc 

tattgcaggg 

tctttacgag 

cggtatgtgt 

tgaatgggtc 

aaccatcaat 

attataatct 

ttcagcfcatc 

tagfcgtggtc 

gagatagtct 

taagafcatga 

tatgtggcat 

ccttggttct 

gacgaatgac 

agttttgttg 

gtcttatgaa 

aacaaatgaa 



gtgggtggaa 

agtagaggag 

ggctaggcgc 

ttggttctcg 

gatgtctcgc 

taggtggcgt 

atctttggtg 

aagaaaaaaa 

ctgcggactg 

tatacttcct 

ggtgttgtgc 

ttggttgggt 

ctcatctaga 

acatggtcat 

ttttattttg 

gaccttgccg 

gagtcatgtg 

ttgtattata 

aaatggaatt 

aatgcactaa 

aacactggaa 

tttctctatc 

ttctaagtaa 

caaaagcagt 

tagcttgtgc 

taatgatgaa 

cgaaagcatt 

taattctfcat 

ggcgagtttt 

tcgtggttat 

agttaaaaat 

gttttgggag 



21539 
21599 
21659 
21719 
21779 
21839 
21899 
21959 
22019 
22079 
22139 
22199 
22259 
22319 
22379 
22439 
22499 
2 2 559 
22619 
22679 
22739 
22799 
22859 
22919 
22979 
23039 
23099 
23159 
23219 
23279 
23339 
23399 
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gatacatggc 
cgaaggaaaa 
agaagagttt 
catattcagt 
ttttctgtaa 
aaataagatt 
gcttaaaggg 
agatgttgtt 
gcaaaatttg 
cattgtatgt 
tagaagcttg 

r 

aatcactatc 
gggcacaacc 
ttgagacgac 
aataattgtg 
attctatttg 
ttgagcaaag 
attgttgtat 
ttttagagaa 
ctaattattfc 
tctttttaaa 
agaaaaggcg 
aaaccctagt 
aattaatgga 
tataccaaat 
tcagctaatt 
ttctacaatg 
aactatatct 
atcttctgaa 
ttgattaatt 
acaactttca 
ccacgaaaat 



tagaaaacat 

atttatctat 

tagttggtca 

tgaatcaatc 

ggtcaatgta 

atttagaaac 

gttatgttaa 

tatgtatgaa 

tttggggagc 

ttgatggttg 

tactatatgt 

tatttcattt 

gtaaaagtgt 

ggttatgcaa 

tgctccttat 

aactacactc 

gccgagatgt 

ggcggtttta 

acacccagag 

gatattaggt 

ctattcatct 

aaaggtgtga 

tgttgtcacg 

taaggctcca 

gtggtccgtg 

aaagccagtg 

taatccattc 

atttgttaat 

gataaatatg 

agattttaat 

tatagaaaat 

ctagaactta 



ggcttttaaa 

tgctaatgcc 

gaatcttgta 

tactgactat 

tctagcctta 

ttaagatgcc 

caaagacaat 

aaatgagact 

gtttcagtac 

gctttggggg 

taggctctgt 

atgcaggcat 

gatgaagatg 

ctttttgcca 

attggtctgg 

tttghtaagt 

tatctattcc 

gcccgattgt 

gtct tccggc 

acttcactaa 

tttctttaat 

atatgcatga 

tgactctcaa 

gctaagtagg 

cgacatgttg 

tcgaatactt 

acggatgaaa 

taagaggttc 

aagatcaaat 

tattacaaac 

tttcacacga 

atctgccctt 



gataaatatc 

atgggatctg 

tattggcatg 

tttagatgga 

agccttaatt 

gcttaaaatt 

ttggcaaaac 

attcaatatc 

tcttttggtt 

tgaacaagaa 

gattgagtag 

tcttcagagc 

gagagctttt 

actatggatg 

tcattttttt 

gctagattgt 

attattaaaa 

t ctaaatcaa 

tgggttagat 

tattcgtatc 

atagcactaa 

aagatcgagt 

agtccatttg 

cgggaaaaga 

gtccataaaa 

atacagtata 

aagctgtgcg 

atatcttggg 

gttttacgta 

ttaaaaaaaa 

aacgcaccgt 

tgttgggttc 



catctttata 

ttccgcttaa 

aattgcgtgc 

attatcatca 

aataatggtt 

aagattttta 

ggaattggaa 

tttttataga 

tataccttcc 

aaggagcaaa 

gaatgatatg 

aacatattgg 

gaaagttata 

gagattcaca 

tatgtttagg 

aataattggc 

aaaagctagt 

ctgaatatta 

gaccttggtc 

ttttttaaat 

attaaccgtg 

ggacaccccc 

aggacttact 

tcaaacgtgt 

gggcatatga 

gttttcgaaa 

cccaacagct 

cacacaaagg 

aaacgaggtg 

gattaatctg 

ttaacagttt 

tcgaacagga 



ttatabagtt 

tgtttctttt 

ttctattgta 

aaatggttta 

acattgagag 

tgtggtactt 

tggcagctta 

ttgtcatttt 

tactttcata 

ctaattcttg 

atttttgaca 

ctccggtttt 

tgtcgtaagc 

aatagactta 

tgtgtgttta 

tgtagctctg 

tagtgtattt 

atttgctctt 

cttatccctt 

taatttgctc 

atctttcaaa 

caaaaaaaaa 

aactgtttga 

tcagtggatt 

aagtttcctt 

taagttttac 

atagctatac 

ttctgtttga 

gtaataactt 

atattttata 

gaaaagcgtg 

ccaaacttca 



23459 

23519 

23579 

23639 

23699 

23759 

23819 

23879 

23939 

23999 

24059 

24119 

24179 

24239 

24299 

24359 

24419 

24479 

24539 

24599 

24659 

24719 

24779 

24839 

24899 

24959 

25019 

25079 

25139 

25199 

25259 

25319 
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tqtccatact 

ZJ 


ccgtactgta 


cataccaact 


atactaaata 


tcgctaaaac 


gfc tttaaaaa 


25379 


tattatacat 


atactttcaa 


tactattata 


cqtatqcgta 


aagttttatc 


ctcaaattca 


25439 


ttatatttca 


tacttaaaaa 


aaattctaat 


agctttatga 


atataaqtct 


taqatttttt 


25499 


tctccatata 


tatatatgat 


aaatttaaaq 


stgggacttc 

__ f j _ _j _ j 


acqcqtatat 

3 3 


ataaatacta 


25559 


tttaaaqtac 

ZJ 


atgtacattt 


ttctaaaaaa 


ataatatttg 


ttagtttgta 


tacattgtgt 


25619 


qtatacqtga 


aqqctcacqt 

ZJ ZJ ZJ 


agacattttg 


cactctcaat 


tatttatact 


agactaataa 


25 679 


ccacctaaat 


attqttctfca 


qcqqttttqa 


cttqaqctta 


cctacgagat 


qccaacqtqt 


25739 


cagtccagtc 


agcaaaaaag 


ttttaaaaaa 


actccqtqqq 


cccacttgtc 


at acttct cc 


25799 


ctcaat ctaa 


cqcctccccq 


crtcaccctac 


tct ct cttcc 


t ccrfcqcqcac 


qctcqtqcqq 

3 ^ — 3 *-* 3 3 3 


25859 


ccaacqqccfcf 


aqtqqtqcqq 

J ZJ 3 3 3 3 


tqt tqqtqtt 


qqtqqtqcqt 

33^33 ^3^3 ^ 


ccqccrtcqqq 

— w 3) ^-^ 3 3 3* 3 


cacqacqqcq 


25919 


gtgttgccgg 


agagatggag 


ctacgcaaga 


qqcqqqcqt C 

3 3 3 ) 3 3 


qacaaqtatq 


qaccccaqqa 


25979 


qqtqqaqqct 

ZJ ZJ ZJ ZJ ZJ ZJ 


acqcqcqaqc 

* >J ZJ 


qcqbcqqccq 

3 3 3 3 3 


cqccctttqc 


cttcaacggc 


qacqacaqca 


26039 


ttqqccqctc 


qtt Ctcqqcc 


tcqcctctct 


qaccqcaqqt 


cqqcctcaqc 


ctaqcaqtaq 


26099 


tcggccacgc 


acgttggcct 


cattttcgtc 


accgtgttct 


tqqqqctqqc 


atqcaqqcqq 

ZJ ZJ ZJ 3 3 


26159 


qaqaaqqaqq 
3 ~j ' * — > > — > 


gatagcggca 


tqqactqcqt 


qtqcqcctqt 


qcaqtqacct 


qqqtqqatac 


26219 


tatgtcaagt 


tggagctctg 


caaqtcqqcq 

^3 — 1 ZJ ZJ 


Ctctqcqacq 


qcqacqqcaa 


caqqqacaca 


26279 


tcgtcgttgt 


caccgtgctq 


cqcqqcacqa 


qaaaqatqqt 


qqcactqacq 


qtqaqqcttq 


26339 


cgatgacaaa 


tqtqatqqtq 


acaaqqqccc 


aacqt cqaqq 


tcgt tgccct 


ggaatagaab 


26399 


ccqatcqqca 


qct ctaqcqq 

-J * 31 ZJ 


tqqcaacqqc 

3 3 3 3^ 


tcaqtqctqq 

3 3 3 3 


cqctaqaqta 


cqaqcqcqqt 


26459 


cqqtqqccac 

33 ZJ ZJ 


aqqqcqqtac 


cqcqqcqcqc 


qqqaqqtqct 

3 3 3 3 3 ^3 


ccaqqcqqcq 


caqtqctccc 


26519 


cccqtqaqaa 


cqaqcqqttt 


aqcatcatqq 


CCatCqtCqC 


qcaccqtcac 


cqccctqqqc 


26579 


ttct ctagcc 


gaccacgctc 


cacctcaaqc 


aacccqqqqa 


qtaqtctttq 


ctqccaccqc 


26639 


qqccttct cc 


tcactgctcg 


• 

qttcaaqqat 


qaqaqaqaqa 


tcqaqqtqqa 

w ^* zzj Z-J ZZJ 


aqqaqqtaqa 

WW WJ WJ wl 2j 3 3 


26699 


aqaqatqaqq 


tgagaatata 


tggatcactg 


acaaatcfcrqc 


ctqttat tfct 


ttqccqcqtt 


26759 


agaaatgcca 


agtcagctaa 


cctagcctaa 


aaccgtccaa 


aataqtqccc 


CQQtattCQt 


26819 


ctggtt ttaa 


qaqttttqaq 


atattaaata 


caatatatqt 


tatfcataqtt 


t aqaqqqt aa 


26879 


attqtactac 


cqtaccataa 


t aqt t cqqqq 


qtaaattqta 

3 3 


ct t cctc tgt 


act cataatq 


26939 


qaaqtcqttt 


aggacaatat 


ttaacrtcaaa 


cattqqqaat 


ataaatcatq 


aataactct c 


26999 


aagttgttga 


gtttgaaaat 


gtaaaaatta 


tatgaataga 


tttttcttga 


aaaatatttt 


27059 


cataaaagta 


tacatatatc 


actttttaat 


atatattttt 


atagaaacaa 


gaagtcaaaa 


27119 


ttatgttttg 


gagaccgtgt 


cgctgtccaa 


aacgagtacg 


g^gggaatac 


tttttactcg 


27179 


tagtttacaa 


tatcgatctg 


ttaactgttt 


ataagagtat 


ttggatccat 


gcagtattgt 


27239 
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agtagtagta 
gagacttaac 
tacaaaacta 
atctataatt 
tcaaaagatt 
gtttaatatt 
actaccattt 
tagtgcgttg 
ggtcacgacg 
tcgaaaattt 
cacaggcgtt 
cctgggtggt 
cgcccacacg 
cgtacgtcgc 
taaaaccaaa 
ccctaaataa 
agcccccaat 
accccacagc 
tcatcctccc 
gtcgttttgc 
ccatggtgtg 
cgacctcctt 
ctcgacgtcc 
cgacagctat 
gattaaaaca 
tttgatcttt 
gtacattgtc 
ttgcagagca 
gcgcaaccaa 
tgcaatcact 
tgcagtgact 
gacactgatg 



gcagtacatt 
tttagtcttt 
attatataaa 
agagaatgtt 
tgtctcgtga 
tataattaat 
aaacagggtc 
tgtcagtaaa 
cagcactgca 
tgttccgtag 
cctctccgcc 
gcatccgttt 
aaccacgttg 
tgtccaataa 
gaaaattctt 
gttttgagag 
ctagggaggc 
ccccatcctc 
tcccgcgtca 
tccggtcggc 
ccgcctcaag 
gcatcttcgg 
tccccgaact 
ctatcaccgg 
aatcacaata 
atgggcttac 
atcattcatg 
tatataatat 
tatatatgct 
tgttgatgtt 
agattgcaat 
tcgaagtgct 



tgagaatatt 
gtatttagac 
tgaaagctaa 
tattgtagca 
attagtctaa 
tttcaaacat 
actccaatgg 
tttcgtacta 
gcagggctgt 
ctaaagcccc 
ggattccgga 
tctgacaggt 
gctttcggcc 
aaagttttaa 
aattacttag 
ttgatgcaaa 
ccctagatca 
tttttttttg 
tcgccatcct 
gaccacccga 
ccatcacaat 
ccaacgcacg 
agctgcccac 
ccgcacacat 
gtaaagttca 
taggcgtcta 
ttttatatgc 
ttaagaaata 
aaataaatac 
tctgagatag 
gacaagtgga 
aatgccaaat 



agagtacgaa 
actaatttag 
thtgcgagac 
tcatataggc 
gattatgaat 
ctgatgtaat 
taggtgaaat 
gtaccacgag 
agcctgtacg 
cccaaagcca 
aagaaaaaag 
gcatgcacct 
aacttgcccg 
caccaactat 
agcatctcca 
aaaatatagg 
ctcctccaag 
gcgggggaaa 
cccacaacct 
catccctcgg 
catcggtttt 
tcgtcaccga 
ccgctgtcca 
gctgcagtaa 
gtttcgtatg 
ggcccatcta 
agtgtcttgt 
aatttgtgtt 
atattgcaaa 
attggaaagg 
ggtgattcct 
gaagaccttc 



ttaggtggtg 
aatattaaat 
aaatttttta 
taattatgga 
gagttttatt 
agggacttaa 
tcaacagctg 
acagctagac 
ggaggcgtag 
gccgcggttt 
aaaaaacaag 
ctcgctcgct 
attctttaat 
agtaaccagc 
acagggtcct 
tccagcagat 
cccccagtcc 
t ttctgagcg 
cccagcgagc 
gaagaaactt 
actccgtcac 
ttgatcatgc 
caaggcctag 
aaaaattcag 
tctgagtctc 
aatcattcgc 
tctatgtcag 
tgcactgagt 
cagtataacc 
ttgtcaattt 
ttgtgcgcat 
gtacttcaac 



tttggataca 
atagactact 
agcctaatta 
ttaattaggc 
aatagtctac 
aagactttta 
ggaaatgcac 
agacacgtca 
gcgcaacatc 
tcatggattg 
atgtccgttc 
accgcggtag 
cccctcacga 
ttaattttaa 
caaacaaagt 
tccctactag 

ggggggctca 

cgcgccatcg 
aagccgccag 
cggcgagatc 
tcactgtcgc 
ccaggtgagt 
cgccgaccat 
agatgatggt 
cttgtttgat 
acagcaaaac 
agagctaatc 
ccttagtact 
tgatgtacat 
atatatttat 
gatgtccgag 
aaatggtgca 



27299 

27359 

27419 

27479 

27539 

27599 

27659 

27719 

27779 

27839 

27899 

27959 

28019 

28079 

28139 

28199. 

28259 

28319 

28379 

28439 

28499 

28559 

28619 

28679 

28739 

28799 

28859 

28919 

28979 

29039 

29099 

29159 
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aaaggaagtg 


ccaaaagatc 


aagcaactat 


actcataagg 


aggacattca 


attgtgcatt 


29219 


tcatggcaga 


gcattagctc 


agatcctatt 


attggcaatg 


agcaaccagg 


gaaggcatat 


29279 


tggcagagga 


tcgcagagca 


ctaccatgct 


aaccgtgatt 


ttgagtctga 


taggaatgca 


29339 


aactctcttg 


agcaccattg 


gggtaacatt 


cagaaggaag 


taagcaagtt 


tcaaggttgc 


29399 


tacaatcaaa 


ttgagcgtcg 


tcatccaagt 


ggcataccac 


atcaagagct 


tgtaagttaa 


29459 


attgtttatt 


tattafctatt 


aataacaatc 


ttgtatgtat 


gtgaattaaa 


acttaaatta 


29519 


tgttgcaggt 


tcttgaagct 


gaggcattat 


actcgtccac 


tgcaccaaag 


aatagggcat 


29579 


ttcagtttaa 


tcatfcgttgg 


ctcaagttga 


ggaattctcc 


aaagtttcaa 


acactagaat 


29639 


cccacaagag 


gccacggtct 


aggaagtctt 


cgaccccaat 


tgagagagct 


ggtgaagaag 


29699 


atgaaggaga 


tgatgctagc 


aagagtacag 


ctcctgattt 


atctcagccg 


agtgctaaaa 


29759 


agagaccaat 


aggtaggaag 


caagcaaagg 


aaaagttgaa 


gaatggagga 


caagatggac 


29819 


catacaaaga 


ggcgatgaaa 


gatttgcttg 


acgctaaaga 


gaaagaagcg 


aaattgaaag 


29879 


aagagagatg 


gaaggaaact 


aaggagattc 


aagagcgcaa 


gctcttattt 


gctgagcgta 


29939 


agttagtgtg 


ggatcaagaa 


cagaagatta 


tgttttgtga 


tgtttccacc 


ttggaaccgg 


29999 


atgtgagaac 


gtatgtgttg 


gctatgaggg 


cacagattgc 


agcttcaaag 


gtggctgccc 


30059 


tcaatggtgg 


atttgatggt 


agtagtggct 


ttggaggtga 


gtttggtggc 


ggtaatggag 


30119 


aagtttgagc 


acfctcgatgg 


aataagttgg 


attctattgg 


atgatccatg 


tgtcct ttac 


30179 


tagtaggata 


tgccattatc 


acgattggtc 


tttggagtcc 


ttttttgtta 


attatttcca 


30239 


caataatttt 


agtgtcactt 


gctagtagga 


catatattac 


tttcagattt 


gttatttata 


30299 


atcgaatcat 


tcatggttgt 


aggatgtatt 


atttttaaat 


tatataatgc 


atcattgggt 


30359 


tcacatagtg 


tattttttat 


gagcaatttt 


cattttcatt 


ggtgaattac 


gaatcttggt 


30419 


tgcatcttgt 


tgtcgtatat 


ggcactgtac 


ccataccata 


tttacatgtt 


taaaaatttt 


30479 


aattttgtat 


tcgaattgta 


gtgtttgaaa 


ttgtgaattt 


aagtatggtfc 


aaattatgtg 


30539 


agttagaaat 


aattgtgttc 


gaatttttgt 


ggtgttaaac 


atactgtata 


tggattgtat 


30599 


tt taaaatac 


aagataaaca 


tgagtaggga 


ctaagaaata 


ggggctactg 


ctggagttgg 


30659 


aggcattttt 


tagtccttga 


gaaatggggg 


cagccctcat 


ttaactttta 


gacgcttcaa 


30719 


aataaggtct 


attgctggag 


atgctctfcag 


gtccccatcg 


tttccttcaa 


tcagcattag 


30779 


ccgctaccaa 


aatttgaaat 


tttaaagttt 


ttcatcgaag 


tttattttcc 


agcattggta 


3 0 839 


i~t~t";=iac^i _ ^ , c^^ , 




t~ 3 i~ 3 \~ ciP\ £5 Pi ct 




ciaci u L. d U La L 


ha t" ]■ M" r~f "f~ a 
LaLLLLyLLd 




atacgccgaa 


tggcgtatta 


tatgtatttg 


gccaaaggat 


gggggcctta 


aaccttagcc 


30959 


ttagtcgtgc 


cctacaaaag 


acacacgcct 


cgtcagggca 


agggtactcg 


agcgtggagg 


31019 


catggttcgc 


aagccatggt 


cggcgaggcc 


atgctctagc 


aatgcggtgc 


agtccacctc 


31079 
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ctctccgagc 


gcggagctcc 


aacgggtgat 


ggccaatgaa 


agaaggagac 


cgacttgccg 


31139 


ttggttgtag 


catgtaaatt 


tcttgcactt 


tcttaataaa 


tttcggctag 


tgttcgctag 


31199 


ctcgaccaaa 


aaaaagagag 


gctaatgatg 


gggttaggaa 


gtgaaaacaa 


gcgcagtgga 


31259 


ggagaagaag 


atcgagaggc 


ctatttgtat 


gatgctttgt 


cgatgtagat 


ttagtcccat 


31319 


gctcatctca 


tccctcagcc 


acaacaatcc 


catcattgta 


gagctcatca 


gcttgctcta 


31379 


ccatctctcc 


ttgtttatgg 


gccactccca 


acctgctacc 


catcgcctga 


tctatgaatc 


31439 


tagctgtcaa 


tgacctcatt 


ggccctagtc 


ttgagatcac 


ccagtggatc 


ctttggcaaa 


31499 


gtggatccgc 


ctttgttttg 


ctttggagaa 


agaaaacgat 


gacttagcta 


aagatctcgt 


31559 


cggtcaaaaa 


gagagatgcc 


tttgatatat 


gctgaaaaat 


agaggagagg 


cagtgtcagc 


31619 


tggagagctc 


tttatccaca 


cccgtgggga 


tcgagcttat 


tggcgtagag 


ggagagacat 


31679 


tgagggagag 


agagtgcaag 


gggatttttt 


tgtaatttct 


agatttggtg 


gtgtttagtg 


31739 


caatactttg 


aacttatttg 


taaattaagt 


aaaacatgat 


tgtaatagaa 


aatatcataa 


31799 


actgacatag 


aaaaacaaag 


ataacaattg 


aagccactag 


cgctatggag 


aaaatgtgtg 


31859 


acctcggtct 


acatataacg 


gctatgtgtt 


afctaccatgt 


cacttchaaa 


actaccatat 


31919 


aaccatatac 


gtttttctcc 


tacttatcaa 


aaatataatt 


aacaaatttt 


tttaccggtt 


31979 


tagtttacaa 


gaaaaaagtt 


tgactgcatt 


gttgataccc 


taccatcctt 


gtacgaaggc 


32039 


aggcgctaca 


caacaccgct 


gccgctgccg 


tcgccgccgt 


aagctaaggc 


tgtcacgccg 


32099 


gcgaccggcc 


acggccgacg 


tggaaagcga 


cctaatctgt 


aaagtgtaaa 


cccaccctat 


32159 


agaaaaaccc 


ggttggtggg 


acgagaatca 


ccgaatcagc 


gtcgacgacg 


acggccgacg 


32219 


actccagcag 


cgggggtcac 


gagactcgga 


gccgagagag 


agaaagagga 


ccacgcgcgc 


32279 


attcactcaa 


ctgcataaaa 


aaacccccgc 


gcggcggctg 


cgcagtcacg 


tctacgctcg 


32339 


cgggatcgct 


cgatgaaatc 


aaccaaaatc 


ttaaacaaac 


cgaaccaacc 


aaccaaccgt 


32399 


cgcgcgtgtg 


cgcgcgaggc 


gctcgattag 


cggagacgca 


aacccatgta 


acaccgtgcg 


32459 


gaaaaactta 


aagaaatccg 


cgtcgctcgc 


gccgtcgcgc 


gcgcgggggg 


cgcgtagtac 


32519 


ctccacacac 


gattctgcac 


ttgtactacc 


acgcgaacct 


gatgcggttt 


accggtcatc 


32579 


gattggctgc 


gaggcttgct 


gttactggtg 


gtggtagact 


ggtagtacgt 


tgcttgtact 


32639 


acctcactca 


tgtctggaga 


ttactacact 


tcgatctttt 


cctctgtttt 


gttaattgag 


32699 


atttggaggt 


gttactgttc 


gctgtgtggt 


taagtatatt 


ggtgtataac 


tacaagttgg 


32759 


uaccGL.caaa. 


gggaaaaaaa 


gguacugcaa 


actggctaat 


ctangactcc 


at tctgcaca 


32 819 


tgcatataga 


taagcactat 


aataaggaac 


tgaggatcgt 


gaaaagtggc 


attaattata 


32879 


acaggaccat 


gtacgactat 


accactggca 


gggatttcac 


ggaatcaact 


ataggagtag 


32939 


gttagttggc 


acttggcaag 


gttgattgat 


tcactaacgt 


ggggaaaaga 


acacacgaga 


32999 
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tcaaaggctg tcgtgggctt aaaataaaag ggcccatctg ggatcagctc ttttaagccc 33 059 
acatcactag ccaggaggct aggagtccag tattgcctcg tactgggccg tcctctgaaa 33119 

► 

tttggaggcc ctgtctaaaa ttctaatcaa gccttaaact taagtgacaa aataaaaaga 33179 

ggtagactat ataacagcat accattacaa cggaatagct gtcgttagca cgatactcta 33239 

tatgcatcag atatggtacc aggtactata ccgacgttag catgatccga taggtatagg 33299 

atctggtgta cctagatatt atgctaacat aatcatgaca tcagctattc cattggaatg 33359 

atataccggt ggtatcttcg gtaaattgtg agcatgctag gaatttaagt aaagggcctt 33419 

agggttaaaa tcacacgttc ttagtcactg cactatcaag tgcatttcaa ccctaatgcc 33479 

cttttatgat ctatatctgc cctcctagcc tattttggac gaggctccct cgtcctagaa 33539 

gtaaatcatc gtatccataa tccaaccgat tagtagagaa aaaacatact tttcgaacgc 33599 

aacagttctt gtcatcttgt gctctcaaat gttcattttc cccttactta aaggacatgg 33659 

aaaacagaac agaccc 33 675 

<210> 3 

<211> 1119 

<212> DNA 

<213> Escherichia coli 

<220> 

<221> CDS 

<222> (1) . . (1119) 

<400> 3 

atg cat aac cag get cca att caa cgt aga aaa tea aca cgt att tac 48 

Met His Asn Gin Ala Pro lie Gin Arg Arg Lys Ser Thr Arg lie Tyr 
15 10 15 

gtt ggg aat gtg ccg att ggc gat ggt get ccc ate gee gta cag tec 96 
Val Gly Asn Val Pro lie Gly Asp Gly Ala Pro lie Ala Val Gin Ser 

20 25 30 

atg acc aat acg cgt acg aca gac gtc gaa gca acg gtc aat caa ate 144 
Met Thr Asn Thr Arg Thr Thr Asp Val Glu Ala Thr Val Asn Gin lie 
35 40 45 

aag gcg ctg gaa cgc gtt ggc get gat ate gtc cgt gta tec gta ccg 192 
Lys Ala Leu Glu Arg Val Gly Ala Asp He Val Arg Val Ser Val Pro 
50 55 60 

acg atg gac gcg gca gaa gcg ttc aaa etc ate aaa cag cag gtt aac 240 
Thr Met Asp Ala Ala Glu Ala Phe Lys Leu He Lys Gin Gin Val Asn 
65 70 75 80 

gtg ceg ctg gtg get gac ate cac tte gac tat cgc att gcg ctg aaa 2 88 

Val Pro Leu Val Ala Asp He His Phe Asp Tyr Arg He Ala Leu Lys 

85 90 95 

gta gcg gaa tac ggc gtc gat tgt ctg cgt att aac cct ggc aat ate 336 
Val Ala Glu Tyr Gly Val Asp Cys Leu Arg He Asn Pro Gly Asn He 

100 105 HO 
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ggt aat gaa gag cgt att cgc atg gtg gtt gac tgt gcg cgc gat aaa 3 84 

Gly Asn Glu Glu Arg lie Arg Met Val Val Asp Cys Ala Arg Asp Lys 
115 120 125 

aac att ccg ate cgt att ggc gtt aac gec gga teg ctg gaa aaa gat 4 32 

Asn lie Pro lie Arg lie Gly Val Asn Ala Gly Ser Leu Glu Lys Asp 
130 135 140 

ctg caa gaa aag tat ggc gaa ccg acg ccg cag gcg ttg ctg gaa tct 4 80 

Leu Gin Glu Lys Tyr Gly Glu Pro Thr Pro Gin Ala Leu Leu Glu Ser 
145 150 155 160 

gec atg cgt cat gtt gat cat etc gat cgc ctg aac ttc gat cag ttc 528 
Ala Met Arg His Val Asp His Leu Asp Arg Leu Asn Phe Asp Gin Phe 

165 170 175 

aaa gtc age gtg aaa gcg tct gac gtc ttc etc get gtt gag tct tat 576 
Lys Val Ser Val Lys Ala Ser Asp Val Phe Leu Ala Val Glu Ser Tyr 

180 185 190 

cgt ttg ctg gca aaa cag ate gat cag ccg ttg cat ctg ggg ate acc 624 
Arg Leu Leu Ala Lys Gin lie Asp Gin Pro Leu His Leu Gly He Thr 
195 200 205 

gaa gee ggt ggt gcg cgc age ggg gca gta aaa tec gec att ggt tta 672 
Glu Ala Gly Gly Ala Arg Ser Gly Ala Val Lys Ser Ala lie Gly Leu 
210 215 220 

ggt ctg ctg ctg tct gaa ggc ate ggc gac acg ctg cgc gta teg ctg 720 
Gly Leu Leu Leu Ser Glu Gly He Gly Asp Thr Leu Arg Val Ser Leu 
225 230 235 240 

gcg gee gat ccg gtc gaa gag ate aaa gtc ggt ttc gat att ttg aaa 768 
Ala Ala Asp Pro Val Glu Glu He Lys Val Gly Phe Asp He Leu Lys 

245 250 255 

teg ctg cgt ate cgt teg cga ggg ate aac ttc ate gec tgc ccg acc 816 
Ser Leu Arg He Arg Ser Arg Gly He Asn Phe He Ala Cys Pro Thr 

260 265 270 

tgt teg cgt cag gaa ttt gat gtt ate ggt acg gtt aac gcg ctg gag 8 64 

Cys Ser Arg Gin Glu Phe Asp Val He Gly Thr Val Asn Ala Leu Glu 
275 280 285 

caa cgc ctg gaa gat ate ate act ccg atg gac gtt teg att ate ggc 912 
Gin Arg Leu Glu Asp He He Thr Pro Met Asp Val Ser He He Gly 
290 295 300 

tgc gtg gtg aat ggc cca ggt gag gcg ctg gtt tct aca etc ggc gtc 9 60 

Cys Val Val Asn Gly Pro Gly Glu Ala Leu Val Ser Thr Leu Gly Val 
305 310 315 320 

acc ggc ggc aac aag aaa age ggc etc tat gaa gat ggc gtg cgc aaa 10 0 8 
Thr Gly Gly Asn Lys Lys Ser Gly Leu Tyr Glu Asp Gly Val Arg Lys 

325 330 335 

gac cgt ctg gac aac aac gat atg ate gac cag ctg gaa gca cgc att 1056 
Asp Arg Leu Asp Asn Asn Asp Met He Asp Gin Leu Glu Ala Arg He 

340 345 350 

cgt gcg aaa gec agt cag ctg gac gaa gcg cgt cga att gac gtt cag 1104 
Arg Ala Lys Ala Ser Gin Leu Asp Glu Ala Arg Arg He Asp Val Gin 
355 360 365 
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Gin Val Glu Lys 
370 



PCT/USO 1/24335 



1119 



<210> 4 

<211> 686 

<212> PRT 

<213> Orysa sativa 

<400> 4 

Met Ala Thr Gly Val Ala Pro Ala Pro Leu Pro His Val Arg Val Arg 
15 10 15 

Asp Gly Gly lie Gly Phe Thr Arg Ser Val Asp Phe Ala Lys lie Leu 

20 25 30 

Ser Val Pro Ala Thr Leu Arg Val Gly Ser Ser Arg Gly Arg Val Leu 
35 40 45 

Val Ala Lys Ser Ser Ser Thr Gly Ser Asp Thr Met Glu Leu Glu Pro 
50 55 60 

Ser Ser Glu Gly Ser Pro Leu Leu Gly He Thr Arg Arg Leu Leu Phe 
65 70 75 80 

Thr Leu His Met Val Gly Asn Val Pro Leu Gly Ser Asp His Pro He 

85 90 95 

Arg He Gin Thr Met Thr Thr Ser Asp Thr Lys Asp Val Ala Lys Thr 

100 105 110 

Val Glu Glu Val Met Arg He Ala Asp Lys Gly Ala Asp Phe Val Arg 
115 120 125 

lie Thr Val Gin Gly Arg Lys Glu Ala Asp Ala Cys Phe Glu He Lys 
130 135 140 

Asn Thr Leu Val Gin Lys Asn Tyr Asn He Pro Leu Val Ala Asp lie 
145 150 155 160 

His Phe Ala Pro Thr Val Ala Leu Arg Val Ala Glu Cys Phe Asp Lys 

165 170 175 

He Arg Val Asn Pro Gly Asn Phe Ala Asp Arg Arg Ala Gin Phe Glu 

180 185 190 

Gin Leu Glu Tyr Thr Glu Asp Asp Tyr Gin Lys Glu Leu Glu His He 
195 200 205 

Glu Lys Val Pro Asn He Ser Leu Phe Ser Val Asn Leu Val Phe Ser 
210 215 220 

Pro Leu Val Glu Lys Cys Lys Gin Tyr Gly Arg Ala Met Arg He Gly 

225 " 23 0 " *" 235 ' " " 240 

Thr Asn His Gly Ser Leu Ser Asp Arg He Met Ser Tyr Tyr Gly Asp 

245 250 255 

Ser Pro Arg Gly Met Val Glu Ser Ala Leu Glu Phe Ala Arg He Cys 

260 2.65 270 
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Arg Lys Leu Asp 
275 

Pro Val lie Met 
290 

Asn Leu Gly Trp 
305 

Glu Gly Glu Asp 



Leu Met Asp Gly 

340 

Pro Glu Glu Glu 
355 

His Ala Ala Asp 
370 

Arg Arg Tyr Phe 
385 

Lys Glu Ala Pro 



Val Gly Met Pro 

420 

Lys Glu Leu Pro 
435 

Arg Leu Val Asp 
450 

Leu Thr Lys Pro 
465 

Leu Ser Ser Gly 



Val Thr Leu Arg 

500 

Gly Val Asp Asp 
515 

Glu Lys Thr Gly 
530 

Glu Thr Asn Gly 
545 

Lys Ser Val Asn 



Gly Ala Leu Leu 

580 

Ala Asp Gin Glu 
595 



Phe His Asn Phe 

280 

Val Gin Ala Tyr 
295 

Asp Tyr Pro Leu 
310 

Gly Arg Met Lys 
325 

Leu Gly Asp Thr 



lie Asp Pro Cys 

360 

Leu Gin lie Gly 
375 

Asp Phe Gin Arg 
390 

Glu Leu Leu Tyr 
405 

Phe Lys Asp Leu 



Pro Val Glu Asp 

440 

lie Ser Met Gly 
455 

Leu Pro His Ala 
470 

Ala His Lys Leu 
485 

Gly Asp Glu Ser 



lie Thr Met Leu 

520 

Arg Val His Ala 
535 

Leu Asn Phe Pro 
550 

Arg Asp Asp Leu 
565 

Val Asp Gly Leu 



Phe Glu Phe Leu 

600 



Val Phe Ser Met 



Arg Leu Leu Val 

300 

His Leu Gly Val 

315 

Ser Ala He Gly 
330 

He Arg Val Ser 
345 

Arg Arg Leu Ala 



Val Ala Pro Phe 

380 

Arg Ser Gly Gin 
395 

Arg Ser Leu Ala 
410 

Ala Thr Val Asp 
425 

Ala Gin Ala Arg 



Val Leu Thr Pro 

460 

He Ala Leu Val 
475 

Leu Pro Glu Gly 
490 

Tyr Glu Gin Leu 
505 

Leu His Ser Val 



Ala Arg Arg Leu 

540 

Val He His His 
555 

Val He Gly Ala 
570 

Gly Asp Gly Val 
585 

Arg Asp Thr Ser 



Lys Ala Ser Asn 
285 

Ala Glu Met Tyr 



Thr Glu Ala Gly 

320 

He Gly Thr Leu 
335 

Leu Thr Glu Pro 
350 

Asn Leu Gly Thr 
365 

Glu Glu Lys His 



Leu Pro Leu Gin 

400 

Ala Lys Leu Val 
415 

Ser He Leu Leu 
430 

Leu Ala Leu Lys 
445 

Leu Ser Glu Gin 



Asn Val Asp Glu 

480 

Thr Arg Leu Ala 
495 

Asp Leu Leu Lys 
510 

Pro Tyr Gly Glu 
525 

Phe Glu Tyr Leu 



He Glu Phe Pro 

560 

Gly Ala Asn Val 
575 

Leu Leu Glu Ala 
590 

■ 

Phe Asn Leu Leu 
605 
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Gin Gly Cys Arg Met Arg Asn Thr Lys Thr lie Ala lie Met Gly Cys 
610 615 620 

lie Val Asn Gly Pro Gly Glu Met Ala Asp Ala Asp Phe Gly Tyr Val 
625 630 635 640 

Gly Gly Ala Pro Gly Lys lie Asp Leu Tyr Val Gly Lys Thr Val Val 

645 650 655 

Gin Arg Gly He Ala Met Glu Gly Ala Thr Asp Ala Leu He Gin Leu 

660 665 670 

He Lys Asp His Gly Arg Trp Val Asp Pro Pro Val Glu Glu 
675 680 685 



<210> 5 

<211> 594 

<212> DNA 

<213> Arabidopsis thaliana 
<220> 

<221> unsure 

<222> (1. .594) 

<223> unsure at all n locations 



<400> 


5 










aaaatcgtca 


atccctctca 


aactcttctc 


accactaatt 


tcttcctctg 


gaacattctc 


ttctctat ta 


ttttgat tec 


ct tggectea 


acactggttt 


etcaattgea 


tgatcttggc 


tcgtcttcag 


ttactttgat 


tcactgagaa 


aaatggcgac 


tggagtattg 


ccagctccgg 


tttctgggat 


caagataccg 


gattcgaaag 


tcgggtttgg 


taaaagcatg 


aatcttgtga 


gaatttgtna 

> 


tgttaggagt 


ctaagatctg 


ctaggagaag 


agtttcggtt 


ateeggaatt 


caaaccaagg 


ctctgattta 


gctgagcttc 


aaccctgcat 


ccgaaggaaa 


gcccctcttc 


ttagtgccaa 


ggcaggaaat 


attgtgaatc 


attgeataan 


gcggttagga 


ggaagnctcg 


gacctgtaat 


ggttgaaatg 


tcgncccttn 


gaagngnaca 


ccggtanggg 


teaaaeggtg 


ccttcttngg 


gtacaaaang 


tnttccttgg 


ancctnttng 


tgggggtttt 


gggattgegg 


aaaaaggggc 


tgnttttnaa 


gggnacctnn 


caaggnagna 


agggngggtc 


tttt 


<210> 
<211> 
<212> 


6 

615 
DNA 











60 



594 



<213> Glycine max 

<220> 

<221> unsure 
<222> (1. . 615) 

<223> unsure at all n locations 

<400> 6 

accagaagtg atgagectta tgaagaactg gacattctta agggtgttga tgctactatg 60 
cttttccatg accttcctta tacagaagac agaattagca gagtgcatgc aaccagaegg 12 0 



28 



WO 02/12478 



PCT7US0 1/24335 



ttatttgagt 


acctatctga 


caattctcta 


aacttccctg 


ttattcacca 


tattcagttc 


180 


ccaaatggga 


ttcacaggga 


tgacttggta 


attggtgctg 


gttctgatgc 


tggagccctt 


240 


ctggttgatg 


ggcttggaga 


tggactactt 


ttggaagccc 


cggacaagga 


ttttgaattt 


300 


attagaaaca 


cttctttcaa 


tttgttgcaa 


ggctgcagaa 


tgagaaatac 


aaagacagag 


360 


tatgtctcat 


gtccatcctg 


tggcagaaca 


ttgtttgatc 


ttcaagaagt 


aagtgcacaa 


420 


attcgggaga 


agacatcaca 


cctncctggt 


gtttcgattg 


caatcatggg 


atgcattgtt 


480 


aatggaccag 


gggagatggc 


tgatgcagac 


tttgggtatg 


tgggaagcac 


tccccggaag 


540 


attgacctct 


atgttgggaa 


gactggtgtg 


aagcgtggga 


attcaatgga 


gcatgccaac 


600 


catggcttga 


tccga 










615 


<210> 
<211> 
<212> 
<213> 


7 

589 
DNA 

Lycopersicon esculentum 








<400> 


7 












tggcgatgaa 


tcacatgatg 


agttggaaat 


cctgaagagc 


tctgatgtta 


caatgattct 


60 


tcataatctg 


ccatatacag 


aggaaaaaat 


tggcagggtt 


caagcagcca 


ggaggctttt 


12 0 


tgagtatctt 


tccgagaatt 


ccttgaactt 


tccagtga'tt 


catcacatac 


aatttcccag 


180 


caacacccac 


agagatgact 


tagtgattgg 


tgccgggaca 


aatgcgggag 


ccctcttggt 


240 


agatgggctt 


ggtgatggac 


ttctcttgga 


agctccagac 


aaggattttg 


attttctcag 


300 


aaatacatct 


ttcaatttgc 


ttcaaggttg 


cagaatgcgg 


aacacaaaaa 


cggaatatgt 


360 


atcatgccca 


tcctgtggca 


gaactttatt 


cgatcttcaa 


gagataagcg 


ctcaaattag 


420 


agagaagacg 


tcacacttgc 


ctggtgtttc . 


aattgccatc 


atgggttgca 


ttgtgaatgg 


480 


acctggggag 


atggctgatg 


ctgactttgg 


atatgttggt 


ggtgctcctg 


gaaagattga 


540 


cctttacgtc 


ggcaagacag 


tggtgaaacg 


ccctattgaa 


atggagcat 




589 



<210> 8 
<211> 617 
<212> DNA 

<213> Mesembryanthemum crystallinum 

<400> 8 

gaaaagcata gacattattt tgactttcaa cgtagaactg gtcaattacc gattcagaaa 60 
gagggtgaag atgtggacta tagaggtgtc ctacaccgtg atggttctgt cctcatgact 12 0 
gtttccttgg acatgttgaa gacacctgaa* ctcctttaca agtcattagc agcaaagctt 18 0 
gttgttggca tgccatttaa ggatctggct actgtagact ctatttttct gagagagctt 24 0 
tcaccagtag atgactctga tgctcggcta gctctgaaga ggttaataga tataagtatg 3 00 
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617 



ggtgtcatag ctcctttttc tgagcaactg acaaagccct tgccaaatgc aattgtattg 360 

gtgaacctta aagagttgtc aaccggtgca tacaagcttt taccagtagg aacccgcttg 42 0 

gcagtatctg tgcgaggtga tgaaccatat tgagacattg gagatcctta aagatattga 48 0 

tgcttcaatg gctttttatg aactgtcttt taccgagagg atattcacac agtgcatgct 54 0 

ggaccaaagc ttttgaggtc ctatcagata agcttggacc tcccgtaatt aacatatcct 60 0 
atcccttcgg attaagg 

<210> 9 

<211> 416 

<212> DMA 

<213> Oryza sativa 

<220> 

<2 21> unsure 
<222> (1 • - 416) 

<223> unsure at all n locations 

<400> 9 

ggattcggca cgagtctaat tgatggtctt ggtgatggtg tacttcttga aagctgctga 60 

ccaagaaatt tgagtttttg agggacacat cctccaactt gttacagggc tgcaggatgc 12 0 

gcaacacaaa aacggaatat ttccctggtc ctcctggtgg gcggacacnc tttnaccncc 18 0 

aaaaattcan tgctcaaatt aaanaaaaaa ccnctcatct gccaggcntc tctattgcta 24 0 

tcatgggtng cattgtcaat gggccagggg aaatggccaa tcctaattnc ggatacttng 3 00 

gaggtgccct ggagaaaatc nacctntatn ttggttnttt tttttnnaac ggggcatngc 360 

aanagaaggg ggcccnnacc ccnanatncn ttcnccgggn ccngggccgn ggggtt 416 

<210> 10 

<211> 621 

<212> DNA 

<213> Zea mays 

<400> 10 

gaattcggca ccagaagcca ctcccacatg caattgtact tgtcaacctc gacgaattgt 60 

caagtggtgc acacaaactt ttgccagaag gcactagact agctgtcact cttcgtggtg 12 0 

atgaatcata cgagcagcta gatattctta aggatgttga tgatataaca atgttgttac 18 0 

ataatgttcc atatggtgag gagaagacag gcagggtgca tgctgctagg aggttatttg 24 0 

agtacttaca ggccaatggc ttgaacttcc ctgtaattca tcacataaat ttccctgaaa 300 

ccattgacag agatggtctt gtcattggtg ctggggccaa cgttggtgct ctcttagtcg 360 

atggtcttgg tgatggtgta ttccttgaag ctgctgacca ggaatttgag tttctgaggg 42 0 

acacatcttt caacttgctc caaggttgca ggatgcgcaa cacaaaaact gaatatgtgt 48 0 
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cttgtccttc ctgcggccga acactctttg accttcagga aatcagcgct gagattagag 540 
aaaagacctc tcatctgcca ggtgtctcga tcgctatcat gggctgtatt gcaatggacc 60 0 
aggagagatg gctgatgccg a 621 



<210> 11 

<211> 601 

<212> DNA 

<213> Pinus taeda 

<220> 

<2 21> unsure 

<222> (1. .601) 

<223> unsure at all n locations 

<400> 11 



aatgcaagaa 


gtacggaagg 


gcaatgcgaa 


ttggcacaaa 


ccatggaagt 


ctttccgatc 


60 


gtactatgag 


ttattatggt 


gattctccca 


ggggtatggt 


ggaatcagca 


tttgaatttg 


120 


cacgcatttg 


ccggaagttg 


ggtt ttcata 


attttgtgtt 


ttcaatgaaa 


gcgagcgatc 


180 


ctgtagtcat 


ggttcaggca 


taccgtttac 


ttgttgcgga 


gatgtatgtg 


caaggatggg 


240 


attatccatt 


gcatttagga 


gttactgaag 


ctggtgaagg 


♦ 

tgaagatgga 


cgcatgaagt 


300 


ctgcaattgg 


cattggaaca 


cttttgcagg 


atggtttggg 


tgatactatt 


cgagtttccc 


360 


ttacagaacc 


tccagaagag 


gagatcaatc 


cctgtagaag 


acttgcaaat 


cttgggatgc 


420 


aagctgcaaa 


gctanggaaa 


ggagtggctc 


cttttgagga 


gaacatcgtc 


attactttac 


480 


tttccaacgc 


angactggcn 


agctccagta 


cagaaggagg 


gtgatgaggt 


ggatacagag 


540 


gagtccgcat 
a 


cgtgatggtc 


tgttctaatg 


tcagtgtcct 


tgacagntga 


agacacanaa 


600 
601 


<210> 
<211> 
<212> 
<213> 


12 

443 

DNA 

Physcomitrella patens 










<400> 


12 












gcacgtatct 


gccgcaaaca 


tgactatatt 


aatttcttgt 


tttctatgaa 


agcaagcaat 


60 


ccggtcgtaa 


tggttcaagc 


atatcggctt 


ttagtatctg 


agatgtatgt 


gaacaactgg 


120 


gactacccat 


tacatcttgg 


tgttactgag 


gctggagagg 


gagaggatgg 


tcgcatgaag 


180 


tcagctatcg 


gcattggtgc 


tttacttcag 


gatggtctcg 


gtgacaccat 


acgtgtttca 


240 


ttgacggaag 


ctcctgaaga 


agaaattgat 


ccttgcacaa 


agcttgcaaa 


ccttggcatg 


300 


aagatttctg 


cagaacagaa 


gggggtggct 


gaattcgaag 


agaagcaccg 


gcgatacttt 


360 


gact tccaac 


gaaggaccgg 


ccaacttcca 


ctgcagaggg 


agggagagtt 


ggtggactac 


420 


agaaacgttc 


tgcaccgtga 


tgg 








443 
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<210> 13 

<211> 938 

<212> DNA 

<213> Arabidopsis thaliana 
<220> 

<2 21> unsure 

<222> (1. .938) 

<223> unsure at all n locations 

<400> 13 



atgatactgc 


cagctannnn 


nnnnnnnnnn 


nnnnnnnnnn 


nnnnnnnnnn 


nnnnnnnnnn 


60 


nnnnnnnnnn 


nnnnnnnnnn 


nnnccacgcg 


tccgaaaacg 


ttttatcctg 


agtttctttc 


12 0 


accatccagc 


ttcatttgtg 


aaaaatcgtc 


aatccctctc 


aaactcttct 


caccactaat 


18 0 


ttcttcctct 


ggaacattct 


cttctctatt 


attttgattc 


ccttggcctc 


aacactggtt 


240 


tctcaattgc 


atgatcfctgg 


ctcgtcttca 


gttactttga 


ttcactgaga 


aaaatggcga 


300 


ctggagtatt 


gccagct ccg 


gtttctggga 


tcaagatacc 


ggattcgaaa 


gtcgggtttg 


360 


gtaaaagcat 


gaatcttgtg 


agaatttgtg 


atgttaggag 


tctaagatct 


gctaggagaa 


420 


gagtttcggt 


tatccggaat 


tcaaaccaag 


gctctgattt 


agctgagctt 


caacctgcat 


480 


ccgaaggaag 


ccctctctta 


gtgccaagac 


agaaatattg 


tgaatcattg 


cataagacgg 


54 0 


tgagaaggaa 


/• 

gactcgtact 


gttatggttg 


gaaatgtcgc 


ccttggaagc 


gaacatccga 


600 


taaggattca 


aacgatgact 


acttcggata 


caaaagatat 


tactggaact 


gttgatgagg 


660 


ttatgagaat 


agcggataaa 


ggagctgata 


ttgtaaggat 


aactgtccaa 


gggaagaaag 


720 


aggcggatgc 


gtgctttgaa 


ataaaagata 


aactcgttca 


gcttaattac 


aatataccgc 


780 


tggttgcaga 


tattcattgt 


gcccctactg 


tagccttacg 


agtcgctgaa 


tgctttgaca 


840 


aoratccQtcrt 


caacccagga 


aattttgcgg 






auya u Ltja i_ L. 




atacagaaga 


tgaatatcag 


aaagaactcc 


agcatatc 






93 8 


<210> 
<211> 
<212> 
<213> 


14 

432 

DNA 

Arabidopsis thaliana 










<400> 


14 












agcataacaa 


ggctctgatt 


tagctgagct 


tcaacctgca 


tccgaaggaa 


gccctctctt 


60 


agtgccaaga 


cagaaatatt 


gtgaatcatt 


gcataagacg 


gtgagaagga 


agactcgtac 


120 


tgttatggtt 


ggaaatgtcg 


cccttggaag 


cgaacatccg 


ataaggattc 


aaacgatgac 


180 


tacttcggat 


acaaaagata 


ttactggaac 


tgttgatgag 


gttatgagaa 


tagcggataa 


240 


aggagctgat 


attgtaagga 


taactgttca 


agggaagaaa 


gaggcggatg 


cgtgctttga 


300 
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aataaaagat 


aaactcgttc 


agcttaatta 


tgcccctact 


gtagccttac 


gagtcgctga 


aaattttgcg 


ga 




<2 10 > 


1 r— • 

15 




<2 11> 


con 

52 8 




<212> 


DNA 




<213> 


Arabidopsis 


\ thaliana 


<220> 






<221> 


unsure 




<222> 


(1- .528) 




<223> 


unsure at all n locat 


<400> 


15 




tgatacgcca 


t~r f~ f* t~ s t~ 3 C"« r~r 

yctcuaLaCy 


a 0 "Ts ^ m *h* 

aCtCaCLaUL 


tccgggaatt 


cccngggccg 


acccacgcgt 


ttcactcctt 




3 "H fr/"* 3 3 3 3 3 
d Ly L-aad.d.d.y 


catggaagtc 




LaLLa L.y d.y c 


gaatctgcgt 


u uyay LL Lye 


ci ci CJcl cl U a. U gU 


tcaatgaaag 


tTT3 rrr^^J ^5 f* t~» (~> 


cty L-ycLuv-ci uy 


atgtatgttc 


4— ffcfa t™ rrrrrta 

cii - , yy cii -yyy ci 


)- f- rif- (-"n't" f* cr 


gaagatggac 


y y «. LyaaaL 0 


t~ r~r f* rt a l-tr /-trra 

uycydLugga 


gacacaataa 


gagtttcact 


gaeggageca 


<210> 


16 




<2 11 > 


379 




<212> 


DNA 




<213> 


Arabidopsis 


1 thaliana 


<400> 


16 




gcgtattggg 


acaaatcatg 


gaagtctttc 


tccccgagga 


atggttgaat 


ctgcgtttga 


tcacaacttt 


gttttctcaa 


tgaaagegag 


tttacttgtg 


gctgagatgt 


atgttcatgg 


tgaggcagga 


gaaggcgaag 


atggacggat 


tcaggacggg 


ctcggtgaca 


caataagagt 


agatccctgc 


aagcgattg 





PCT/USO 1/24335 

caatataccg ctggttgcag atattcattt 360 
atgetttgae aagatccgtg tcaacccaag 42 0 

432 



ions 

agggaagctg gtacgcctgc aggtaccegg 6 0 

ccgaaagaac tccagcatat cgagcaggtc 12 0 

tacgggagag caatgegtat tgggacaaat 18 0 

tattaegggg attctccccg aggaatggtt 24 0 

eggaaattag actatcacaa ctttgttttc 300 

gtccaggcgt acegtttact tgtggctgag 3 60 

catttgggag ttactgaggc aggagaaggc 42 0 

attgggaege ttcttcagga egggcteggt 48 0 

ccagaagagg agatagat 52 8 



tgacegtate atgagctatt aeggggatte 60 

gtttgeaaga atatgtcgga aattagacta 12 0 

caacccagtg atcatggtcc aggegtaccg 18 0 

atgggattat ectttgeatt tgggagttac 24 0 

gaaatctgcg attggaattg ggaegcttet 3 00 

ttcactgacg gagccaccag aagaggagat 36 0 

379 
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<210> 17 

<211> 395 

<212> DNA 

<213> Arabidopsis thaliana 



<400> 


17 












aaagaactcc 


agcatatcga 


gcaggtcttc 


actcctttgg 


ttgagaaatg 


caaaaagtac 


60 


gggagagcaa 


tgcgtattgg 


gacaaatcat 


ggaagtcttt 


ctgaccgtat 


catgagctat 


12 0 


tacggggatt 


ctccccgagg aatggttgaa tctgcgtttg agtttgcaag aatatgtcgg 

* 


180 


aaattagact 


atcacaactt 


tgttttctca 


atgaaagcga 


gcaacccagt 


gatcatggtc 


24 0 


caggcgtacc 


gtttacttgt 


ggctgagatg 


tatgttcatg 


gatgggatta 


tcctttgcat 


300 


ttgggagtta 


ctgaggcagg 


agaaggcgaa 


gatggacgga 


tgaaatctgc 


gattggaatt 


360 


ggggacactt 


cttcaggacg 


ggctcggtga 


cacaa 






395 * 


<210> 
<211> 
<212> 
<213> 


18 

395 

DNA 

Arabidopsis thaliana 










<400> 


18 












aaagaactcc 


agcatatcga 


gcaggtcttc 


actcctttgg 


ttgagaaatg 


caaaaagtac 


60, 


gggagagcaa 


tgcgtattgg 


gacaaatcat 


ggaagtcttt 


ctgaccgtat 


catgagctat 


12 0 


tacggggatt 


ctccccgagg 


aatggttgaa 


tctgcgtttg 


agtttgcaag 


aatatgtcgg 


180 


gaattagact 


atcacaactt 


tgttttctca 


atgaaagcga 


gcaacccagt 


gatcatggtc 


240 


caggcgtacc 


gtttacttgt 


ggctgagatg 


tatgttcatg 


gatgggatta 


tcctttgcat 


300 


ttgggagtta 


ctgatgcagg 


agaaggcgaa 


* 

gatggacgga 


tgaaatctgc 


gattggaatt 


360 


gggacgcttc 


ttcaggacgg gctcggtgac 


acaat 






395 


<210> 
<211> 
<212> 
<213> 


19 

412 

DNA 

Arabidopsis thaliana 










<400> 


19 












atgctggagg 


ccttcttgtg 


gatggactag 


gtgatggcgt 


aatgctcgaa 


gcacctgacc 


60 


aagattttga 


ttttcttagg 


aatacttcct 


tcaacttatt 


acaaggatgc 


agaatgcgta 


120 


acactaagac 


ggaatatgta 


tcgtgcccgt 


cttgtggaag 


aacgcttttc 


gacttgcaag 


180 


aaatcagcgc 


cgagatccga 


gaaaagactt 


cccatttacc 


tggcgtttcg 


atcgcaatca 


240 


tgggatgcat 


tgtgaatgga 


ccaggagaaa 


tggcagatgc 


tgatttcgga 


tatgtaggtg 


300 


gttctcccgg 


aaaaatcgac 


ctttatgtcg 


gaaagacggt 


ggtgaagcgt 


gggatagcta 


360 


tgacggaggc 


aacagatgct 


ctgatcggtc 


tgatcaaaga 


acatggtcgt 


tg 


412 



34 



WO 02/12478 PCT7US0 1/24335 



<210> 20 

<211> 1172 

<212> DNA 

<213> Arabidopsis thaliana 
<220> 

<221> unsure 

<222> (1. .1172) 

<223> unsure at all n locations 

<400> 20 



gggtatgcca 


ttcaaggatc 


tggcaactgt 


tgattcaatc 


ttattaaaga 


gagctaccgc 


60 


ctgtagatga 


tcaagtggct 


cgtttggctc 


taaaacggtt 


gattgatgtc 


agtatgggag 


120 


ttatagcacc 


tttatcagag 


caactaacaa 


agccattgcc 


caatgccatg 


gttcttgtca 


18 0 


acctcaagga 


actatctggt 


ggcgcttaca 


agcttctccc 


tgaaggtaca 


cgcttggttg 


240 


tctctctacg 


aggcgatgag 


ccttacgagg 


agcttgaaat 


actcaacaac 


attgatgcta 


300 


cgatgattct 


ccatgatgta 


cctttcactg 


aagacaaagt 


tagcagagta 


catgcagctc 


360 


ggaggctatt 


cgagttctta 


tccgagaatt 


cagttaactt 


tcctgttatt 


catcacataa 


420 


acttcccaac 


cggaatccac 


agagacgaat 


tggtgattca 


tgcagggaca 


tatgctggag 


480 


gccttcttgt 


ggatggacta 


cgtgatggcg 


taatgctcga 


agcacctgac 


caagattttg 


540 


attttct tag 


gaatacttcc 


ttcaacttat 


tacaaggatg 


cagaatgcgt 


aacactaaga 


600 


cggaatatgt 


atcgtgcccg 


tcttgtggaa 


gaacgctttt 


cgacttgcaa 


gaaatcagcg 


660 


ccgagatccg 


agaaaagact 


tcccatttac 


ctggcgtttc 


gatcgcaatc 


atgggatgca 


720 


ttgtgaatgg 


accaggagaa 


atggcagatg 


ctgatttcgg 


atatgtaggt 


ggttctcccg 


780 


gaaaaatcga 


cctttatgtc 


ggaaagacgg 


tggtgaagcg 


tgggatagct 


atqacqqaqcr 


840 


caacagatgc 


tctgatcggt 


ctgatcaaag 


aacatggtcg 


ttgggtcgac 


ccgcccgtgg 


900 


ccgatgagta 


gatttcaaaa 


cggagaaaga 


tgggtgggcc 


attctttgaa 


aactgtgaga 


960 


ggagatatat 


atatttgtgt 


gtgtatatca 


tctgtttgtt 


gtgtattgca 


tcattcattt 


1020 


tggacaaatg 


tccaaattct 


cttaagttga 


taaaagttct 


taggccaaat 


taaatttaat 


1080 


ataaaaaaaa 


aaaaaaaaag 


gcnnnnnnnn 


nnnnnnnnnn 


nnnnnnnnnn 


nnnnnnnnnn 


1140 


nnnnnnnnnn 


nnnnnnnnnn 


nnnnnnnnnn 


nn 






1172 


<210> 
<211> 
<212> 
<213> 


21 

584 

DNA 

Zea mays 












<400> 


21 












caggttaatt 


aattcctgta 


cgccgtcggt 


ttcgggtact 


cgtttaattt 


cttcccgacc 


60 
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acggttgatg 


gcaatgtaac 


cggcttgttt 


acccacatag 


ccatagtcgg 


catcggccat 


120 


ttccccgggg 


ccattgacaa 


tacagcccat 


gacggcgatg 


tctaaacccg 

to/ 


ttagatgttt 

to / to/ 


180 


agtggcttct 


cggacttcat 


gtaacacgtc 


ttccaagttg 


aacaacgtgc 


ggccacagga 


240 


aggacaggcc 


acatattcca 


ccatggtttt 


ccgcaaaccc 


agcgcctgga 


gaatgctgta 

J to* to^ 


300 


gcaaacggga 


atttcttttt 


cgggggcttc 


ggtgagggat 

to/ to^ -to/ to/ to/ toj/ 


acccggatag 


tatcgccaat 

to/ 


360 


gccatcagct 


aaaagggtgg 


caatgccagc 


ggtggattta 


atgcggccat 


attccccatc 


420 


cccggcttcg 


gtaaccccta 


gatggagggg 


ataatccatg 


cccaactcgt 


tcatacgttt 


480 


caccatgagg 


cgataggcgg 


ccaacattac 


cggtacccgg 


gacgctttca 


tggaaacgac 


540 


taggttgcgg 


aaatctaaag 


actcacaaat 


tttgatgaat 


tcca 




584 


<210> 
<211> 
<212> 
<213> 


22 

670 

DNA 

Zea mays 












<400> 


22 












caggtcgact 


ctagaggatc 


ggcgttaacc 


atggttctct 

to/ »™/ 


ctccgaaaga 

to/ to/ 


atgcttttac 

to/ 


60 


ctacttttta 


cccccgaggg 


catggtgcaa 

to« to/ ' * 


tcggccctgg 

to/ -./ — ; to/ 


aattcatcaa 


aatttgtgag 

^ to/ to/ 


120 


tccttagatt 


tccgcaacct 


agtcgtttcc 


atgaaagcgt 

to/ to/ * 


cccgggtacc 

to/ toV toy 


ggtaatgttg 

to/ to/ to/ to/ 


180 


gccgcctatc 


gcctcatggt 


gaaacgtatg 

to/ to/ to/ 


gacgagttgg 

- — * — ^ to/ «/ —j 


gcatggatta 

to/ to/ to/ 


tcccctccat 


240 


ctaggggtta 

to/ to/ ^ 


ccgaagccgg 


ggatggggaa 

to/ to/ — * — ' to/ 


tatggccgca 

to/ to/ to/ 


ttaaatccac 


cgctggcatt 

to/ wP to/ 


300 


gccacccttt 


tagctgatgg 

— * to/ 


cattggcgat 

to/ — / 


actatccggg 

to/ 


tatccctcac 


cgaagccccc 

^ to/ 


360 


gaaaaagaaa 


ttcccgtttg 

to/ to/ 


ctacagcatt 

to/ 


ctccaggcgc 

to/ to/ to/ 


tgggtttgcg 

to/ to/ to/ to/ —/ 


gaaaaccatg 
— / — > 


420 


gtggaatatg 

to/ to/ w-/ «— > 


tggcctgtcc 


ttcctgtggc 

to/ to/ to/ 


cgcacgttgt 

to/ to/ «-/ 


tcaacttgga 


aqacgtqtta 


480 


catgaagtcc 


gagatgccac 


taaacatcta 


ac gggtttag 


actttcgccg 


tcatgggctg 


540 


tattgtcaat 


ggccccgggg 


caatggccga 


tgccgactat 


ggctatgtgg 


gtaaacaagc 


600 


cggttacatt 


gccatcaacc 


gtggtcggga 


agaaattaaa 


cgagtacccg 


aaaccgacgg 


660 


cgtacaggaa 












670 


<210> 
<211> 
<212> 
<213> 


23 

596 

DNA 

Zea mays 








— 




<220> 
<221> 
<222> 
<223> 


unsure 
(1. .596) 

unsure at all n locations 








<400> 


23 












caggtcgact 


ctagaggatc 


ggcgttaacc 


atggttctct 


ctccgaaaga 


atgcttttac 


60 



36 
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ctacttttta 


cccccgaggg 


catggtgcaa 


tcggccctgg 


aattcatcaa 


aatttgtgag 


12 0 


tccttagatt 


tccgcaacct 


agtcgtttcc 


atgaaagcgt 


cccgggtacc 


ggtaatgttg 


180 


gccgcctatc 


gcctcatggt 


gaaacgtatg 


gacgagttgg 


gcatggatta 


tcccctccat 


240 


ctaggggtta 


ccgaagccgg 


ggatggggaa 


tatggccgca 


ttaaatccac 


cgctggcatt 


300 


gccacccttt 


tagctgatgg 


cattggcgat 


actatccggg 


tatccctcac 


cgaagccccc 


360 


gaaaaagaaa 


ttcccgtttg 


ctacagcatt 


ctccaggcgc 


tgggtttgcg 


gaaaaccatg 


420 


gtggaatatg 


tggcctgtcc 


ttcctgtggc 


cgcacgttgt 


tcaacttgga 


agacgtgtta 


480 


catgaagtcc 


gagatgccac 


taaacatcta 


acgtgtttag 


actttcgncg 


tcatgtgctg 


54 0 


tattgtcaat 


ggccccggtg 


caatggccga 


tgccgactat 


ggctatgtgg 


gtaaac 


596 


<210> 
<211> 
<212> 
<213> 


24 
4 03 
DNA 

Zea mays 












<400> 


24 












cagacaagga 


ggaggaaaac 


tcgaactgtg 


atggtgggga 


atgtgccact 


tgggagtgat 


60 


caccccataa 


ggattcaaac 


catgacgact 


tcagatacca 


aggatgttgc 


gaaaacagta 


120 


gaggaggtga 


tgaggatagc 


agataaagga 


gctgatcttg 


ttagaataac 


agtccagggt 


180 


aggaaggaag 


ctgatgcctg 


ctttgagatc 


aagaacactc 


tggttcagaa 


gaattacaac 


240 


attccactag 


tggccgatat 


tcattttgct 


cctacggtag 


ctctaaaggt 


ggcagaatgt 


300 


tttgacaaaa 


ttcgtgtgaa 


cccaggaaat 


tttgctgatc 


gtcgtgctca 


atttgaaaag 


360 


chggaatata 


ctgacgacga 


ctaccaaaaa 


gagctagagc 


ata 




403 


<210> 
<211> 
<212> 
<213> 


25 
2 93 
DNA 

Zea mays 












<400> 


25 












cagacaaggc 


ggaggaaaac 


tcgaactgtg 


atggtgggga 


atgtgccact 


tggcagtgat 


60 


caccccataa 


ggattcaaac 

• 


catgacgact 


tcagatacca 


aggatgttgc 


gaaaacagta 


12 0 


gaggaggtga 


tgaggatagc 


agataaagga 


gctgatcttg 


ttagaataac 


agtccagggt* 


180 


aggaaggaag 


ctgatgcctg 


ctttgagatc 


aagaacactc 


tggttcagaa 


gaattacaac 


240 


attccactag 


tggccgatat 


tcattttgct 


cctacggtag 


ctctaagggt 


ggc 


293 



37 
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<210> 
<211> 
<212> 
<213> 


26 

456 

DNA 

Zea mays 




<400> 


26 




cagacaaggc 


ggaggaaaac 


tcgaactgtg 


caccccataa 


ggattcaaac 


catgacgact 


gaggaggtga 


tgaggattgc 


agataaagga 


aggaaggaag 


ctgatgcctg 


ctttgagatc 


ccttccacta 


gtggacctga 


tattcatttt 


tgtttggaca 


aattaattga 


aacacacaat 


agctggaatt 


taccggctac 


gacttccata 


ttaccttaat 


cgaatatttc 


acagaaattt 


<210> 
<211> 
<212> 
<213> 


27 

619 

DNA 

Zea mays 




<400> 


27 




caccgaaggt 


ttctaattta 


tttctcagat 


atgtacattg 


tatgctcagt 


tcctgcattg 


cagacttggc 


tacagcctac 


agccctactc 


gtccttgatc 


agctggatca 


aggcgtcagt 


aacggtcttg 


ccaacataaa 


ggtcgatctt 


ggcatcagcc 


atctctcctg 


gtccattgac 


tggcagatga 


gaggtctttt 


ctctaatctc 


tcggccgcag 


gaaggacaag 


acacatattc 


gagcaagttg 


aaagatgtgt 


ccctcaggaa 


tacaccatca 


ccaagaccat 


cgactaagag 


aagaccatct 


ctgtcaatg 




<210> 
<211> 
<212> 
<2 13> 


28 

422 

DNA 




<400> 


28 




tcgcttgcac 


ttgggtgtta 


cagaagctgg 


tattggcatt 


gggacactgc 


taatggatgg 



PCT7US0 1/24335 



atggtgggga atgtgccact tggcagtgat 6 0 

tcagatacca aggatgttgc gaaaacagta 12 0 

gctgatcttg ttagaataac agtccagggt 18 0 

aagaacaact ctggttcaga agaattacaa 240 

gctccttcag tagctttaaa ggtggcagaa 300 

ttcttgttga tagtgtacct taattagaaa 360 

aagcgcttgg gcttgtttaa caattggttt 42 0 

gaattt 456 



ctcaataaat gtacaaaatg tgtagggatg 60 

cgtgtttcgc tttacagaat atataaacta 12 0 

ctcggcagga ggatccaccc atcggccatg 18 0 

tgcaccttcc atggcgatgg cgcgctgcac 24 0 

tccgggagcg cctccaacgt atccgaaatc 300 

aatacaaccc atgatagcga tcgaaacacc 360 

agcgctgatt tcctgaaggt caaagagtgt 42 0 

agtttttgtg ttgcgcatcc tgcaaccttg 48 0 

ctcaaattcc tggtcagcag cttcaaggaa 54 0 

agcaccaacg ttggccccag caccaatgac 600 

619 



agagggtgas gstggaagga tgaaatctgc 60 
tttgggtgat acaatccgtg tctccctcac 120 



38 
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agaaccacca 


gaagaagaga 


ttgatccttg 


ccaaaggttg 


gcaaatcttg 


ggacgcaggc 


180 


cgcaaacctt 


caaattgggg 


tggccccatt 


tgaagaaaag 


cacaggcgct 


attttgattt 


240 


ccagcgtagg 


agtggtcaat 


tgcctttgca 


gaaggaggga 


ggcgatagtt 


gactacagaa 


300 


atgtcctgca 


tcgtgatggt 


atctgactga 


tggcagtttc 


cctggatcag 


ttgaaggctc 


360 


ctgatctcct 


ttataggtat 


attgcagcaa 


agcttgcgga 


tggcatgcct 


ttcaaggatc 


420 


tg 












422 


<210> 
<211> 
<212> 
<213> 


29 

430 

DNA 

Zea mays 






- 






<400> 


29 












tcgcttgcac 


ttgggtgtta 


cagaagctgg 


agagggtgaa 


gatggaagga 


tgaaatctgc 


60 


tattggcatt 


gggacactgc 


taatggatgg 


tttgggtgat 


acaatccgtg 


tctccctcac 


12 0 


agaaccacca 


gaagaagaga 


ttgatccttg 


ccaaaggttg 


gcaaatcttg 


ggacgcaggc 


180 


tgcaaacctt 


caaattgggg 


tggccccatt 


tgaagaaaag 


cacaggcgtt 


attttgattt 


240 


ccagcgtagg 


agtggtcaat 


tgcctttgca 


gaaggagggt 


gaggaagttg 


actacagaaa 


300 


tgtcctgcat 


cgtgatggta 


tctgtactga 


tggcagtttc 


cctggatcag 


ttgaaggctc 


360 


ctgatctcct 


ttataggtct 


cttgcagcaa 


agcttgcggt 


tggcatgcct 


ttcaaggatc 


420 


tggctactgt 












430 


<210> 
<211> 
<212> 
<213> 


30 

528 

DNA 

Zea mays 












<400> 


30 












gacaggcagg 


gtgcatgctg 


ctaggaggtt 


atttgagtac 


ttacaggcca 


atggcttgaa 


60 


cttccctgta 


attcatcaca 


taaatttccc 


tgaaaccatt 


gacagagatg 


gtcttgtcat 


12 0 


tggggctggg 


gccaacgttg 


gtgctctctt 


agtcgatggt 


cttggtgatg 


gtgtattcct 


180 


tgaggcggct 


gaccaggaat 


ttgagttcct 


gagggacaca 


tctttcaact 


V 

tgctccaagg 


240 


ttgcaggatg 


cgcaacacaa 


aaactgaata 


tgtgtcttgt 


ccttcctgcg 


gccgaacact 


300 


ctttgacctt 


caggaaatca 


gcgctgagat 


tagcgaaaag 


acctctcatc 


tgccacgtgt 


360 


ttcgatcgct 


atcatgggtt 


gtattgtcaa 


tggaccagga 


gcgctggctg 


atgccgattt 


420 


cggatacgtt 


ggcggcgctc 


ccggaaagat 


cgacctttat 


attggcacga 


ccgttatgca 


480 


gcgcgccatc 


gccatggacg 


gtgcaactga 


cgccttgatc 


cagctgat 




528 



39 
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<210> 31 

<211> 303 

<212> DNA 

<213> Zea mays 

<400> 31 

ggggccaacg ttggtgctct cttagtcgat ggtcttggtg atggtgtatt ccttgaggcg 6 0 

gctgaccagg aatttgagtt cctgagggac acatctttca acttgctcca aggttgcagg 12 0 

atgcgcaaca caaaaactga atatgtgtct tgtccttcct gcggccgaac actctttgac 18 0 

cttcaggaaa tcagcgctga gattagagaa aagacctctc atctgccacg tgtttcgatc 24 0 

gctatcatgg gttgtattgt caatggacca ggagagatgg ctgatgccga tttcggatac 300 

gtt 303 

< 

<210> 32 
.<211> 613 
<212> DNA 
<213> Zea mays 

<220> 

<221> unsure 

<222> (1..613) 

<223> unsure at all n locations 

<400> 32 

cgagatggcg ttccatgccn ggcccttcct cctcttcctc ttcttctgcc cccccgctgg 60 

cttggaaaag ggagagaaac tcgcgcactc ggttatcgaa gggaggagcg cgggcgaggg 12 0 

tgaggtttcg cccacacgga gctgcgaggt gtttgtagga tctcctaggt gagcccctgc 180 

tgcttggaga cagccatggc caccggcgtg gctccagctc ctctcccaca tgtcagagtg 24 0 

cgtcatgggg gcgtcgggtt caccaggagc gtcgattttg cgaaggtctt gtctgctccc 300 

ggtgccggca cgatgagagc aagctcctct agaggcaggg cgctcgtggc gaagagctct 360 

agtactggct cggagaccat ggagctcgag ccatcttcag aaggaagccc acttttagta 42 0 

cccaggcaga agtactgtga atcaacacac cagacaagga ggaggaaaac tcgaactgtg 48 0 

atggtgggga atgtgccact tggcagtgat catcccataa ggattcaaac catgacgact 54 0 

tcagatacca aggatgttgc aaaaacagta gaggaggtga tgaggatagc agataaagga 60 0 

gctgatcttg tta 613 

--<21Q>- 33 ' * 

<211> 464 
<212> DNA 
<213> Glycine max 

* 

<400> 33 

agagcatgaa atcttctgcg aggaaaaggg tgtcaattat cacgaactca aatcctggcc 60 



40 
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aagatattgc 


tgaacttcaa 


cctgcatccc 


caggaagccc 


tcttttggtt 


cctaggcaaa 


12 0 


agtattgtga 


ateattgeae 


aaacccatca 


ggagaaaaac 


aagcacagta 


atggttggta 


18 0 


acgtggctat 


tggtagegag 


catcctataa 


gaattcagac 


catgactaca 


actgacacta 


240 


aggatgttgc 


tgggacagtt 


gaacaggtga 


tgagaatagc 


agataaagga 


gctgatattg 


300 


tacggataac 


agttcaaggg 


aagaaagaag 


ctgatgcttg 


ttttgagatt 


aaaaacaccc 


360 


ttgtgcagaa 


aaattacaac 


atacccgtgg 


tggctgatat 


teattttget 


ccctctgttg 


420 


ctttgcgggt 


agetgaatge 


tttgataaga 


ttcgtgtaaa 


ccct 




464 


<210> 
<211> 
<212> 
<213> 


34 
7 05 
DNA 

Glycine max 










<400> 


34 












gtagctgaat 


gctttgataa 


gattcgtgta 


aaccctggaa 


attttgetga 


tagaeggget 


60 


caatttgaaa 


cattagagta 


cacagaagaa 


gactatcaga 


aagaacttga 


gcatat tgaa 


120 


aaggttttca 


caccattggt 


tgagaaatgt 


aagaaatatg 


ggagagcaat 


gcgcattggg 


180 


acaaaccatg 


gaagtctttc 


tgategtata 


atgagctact 


atggagactc 


gectagggga 


240 


atggtagaat 


ctgcttttga 


atttgeaagg 


atatgecgaa 


agttagacta 


tcacaatttt 


300 


gttttttcta 


tgaaagcaag 


caacccagtt 


atcatggttc 


aggcataccg 


cttacttgtg 


360 


gctgaaatgt 


atgtccaagg 


ctgggattat 


ccattacact 


tgggtgttac 


tgaagctgga 


420 


gaaggtgagg 


atgggaggat 


gaagtctgea 


ataggcattg 


gaactcttct 


tcaggatgga 


480 


t~ t oao t era t a 


caattagggt 


ttctctcaca 


y ci a v_ cl ci y 


«-yy «-y y c*-y c^*- 




3 <± U 


agaaggttgg 


caaatcttgg 


aatgatagct 


tctgaactcc 


agaagggggt 


ggaacctttt 


600 


gaagaaaagc 


acagacatta 


ttttcgactt 


tcagcgccga 


tctggtcaat 


tgccagtgca 


660 


aaaagagggt 


gaggaggtgg 


attacagagg 


tgtactgeae 


cgtga 




705 



<210> 35 

<211> 564 

<212> DNA 

<213> Glycine max 

<220> 

<2 21> unsure 

<222> (1. .564) 

<223> unsure at all n locations 

<400> 35 



aagcncggaa tteggctega gaggaactca aatcctggcc aagatattgc tgaacttcaa 60 
cctgtatccc caggaagccc tcttttggtt cctaggcaaa agtattgtga atgattacac 12 0 
aaaactgtca ggagaaaaac aaacacagtg atggttggta acgtggctat tggtagegag 18 0 



41 
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catcctataa 


gaattcagac 


catgactacg 


actgacacta 


aggatgttgc 


tgggacagtt 


240 


gaacaggtga 


tgagaatagc 


agataaagga 


gctgatattg tacggataac 


agttcaaggg 


300 


aagaaagaag 


ctgatgcttg 


ttttgagatt 


aaaaacaccc 


ttgttcagaa 


aaattacaac 


360 


a. L» L» O ^-j UV^nJ 






ccctctggtg 


ctttgcgggt 


agcugaauy c 


a o r» 


tttgataaga 


ttcgtgtaaa 


ccctggaaat 


c u u gc c gac a 


gacgggctca 


atttgaaaca 


480 


ttagagtaca 


cagatgatga 


ctatcagaaa 


gaacttgagc 


atactgaaaa 


ggttttcaca 


540 


ccattggttg 


agaaatgtaa 


gaaa 








564 


<210> 

^ 1 1, 

<212> 
<213> 


36 

511 

DNA 

Glycine max 










<400> 


36 












aaaccatgga 


agtctttctg 


at cgtataat 


n^i erf 1 1~ •}— f- 
y oi y l. ct u» ct u 


yy»y« ( -tcyL. 


ctaggggaat 


60 


ggtagaatct 


gcttttgaat 


ttgcaaggat 


cl Ly y ctdciy 


L. i_ cty aLUctLu 


acaattttgt 


120 


tttttctatg 


aaagcaagca 


acccagttat 


Let L.y y L. U Lay 


y LdtdCCyCL 


tacttgtggc 


180 


tgaaatgtat 


gtccaaggct 


gggattatcc 


attacacttg 


ggtgttactg 


aagctggaga 


240 


aggtgaggat 


gggaggatga 


agtctgcaat 


aggcattgga 


actcttcttc 


aggatggatt 


300 


gggtgataca 


attagggttt 


ctctcacaga 


accaccagag 


gaggagatag 


acccttgcag 


360 


aaggttggca 


aatcttggaa 


tgatagcttc 


tgaactccag 


aagggggtgg 


aaccttttga 


420 


agaaaagcac 


agacattatt 


ttgactttca 


gcgccgatct 


ggtcaattgc 


cagtgcataa 


480 


agagggtgag 


gaggtggatt 


acagaggtgt 


a 






511 



<210> 37 

<211> 498 

<212> DNA 

<213> Glycine max 

<220> 

<221> unsure 

<222> (1. .498) 

<223> unsure at all n locations 



<400> 


37 












cggaggtggc 


gtgaatgctt 


tgataagatt 


cgtgtaaacc 


ctggaaattt 


tgctgataga 


6 0 


cgggctcaat 


ttgaaacatg 


agagtggaca 


naataagact 


atgagaaaga 


acttgagcat 


120 


attgaaaagg 


ttttcacacc 


attggttgag 


aaatgtaaga 


aatatgggag 


agcaatgcgc 


180 


attgggacaa 


accatggaag 


tcttfcctgat 


cgtataatga 


gctactatgg 


agactcgcct 


240 


aggggaatgg 


tagaatctgc 


ttttgaattt 


gcaaggatat 


gccgaaagtt 


agactatcac 


300 



42 
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aattttgttt 


tttctatgaa 


agcaagcaac 


cttgtggctg 


aaatgtatgt 


ccaaggctgg 


gctggagaag 


gtgaggatgg gaggatgaag 


gatggattgg 

■• 


gtgataca 




<210> 
<211> 
<212> 
<213> 


38 

440 

DNA 

Glycine max 


<400> 


38 




gtagctgaat 


gctttgataa 


gattcgtgta 


caatttgaaa 


cattagagta 


cacagaagaa 


aaggttttca 


caccattggt 


tgagaaatgt 


acaaaccatg 


gaagtctttc 


tgatcgtata 


atggtagaat 


ctgcttttga 


atttgcaagg 


gttttttcta 


tgaaagcaag 


caacccagtt 


gctgaaatgt 


atgttcaagg 


ctgggattat 


aaaagtgagg 


atgggaggat 




<210> 
<211> 
<212> 
<213> 


1 

39 
3 53 
DNA 

Glycine max 


<400> 


39 




aattcggctc 


gagaggaact 


caaatcctgg 


cccaggaagc 


cctcttttgg 


ttcctaggca 


caggagaaaa 


acaaacacag 


tgatggttgg 


aagaattcag' 


accatgacta 


cgactgacac 


gatgagaata 


gcagataaag 


gagctgatat 


agctgatgct 


tgttttgaga 


ttaaaaacac 


<210> 
<211> 
<212> 

<. £. X. O > 


40 
577 

DNA " 
Glycine max 


<400> 


40 




gatgtttttg 


tcgtgtattc 


tattcctatt 


tcaattttgt 


aaatcagagg 


cagagagagt 



PCT7US0 1/24335 

ccagttatca tggttcaggc ataccgctta 360 
gattatccat tacacttggg tgttactgaa 420 
tctgcaatag gcattggaac tcttcttcag 48 0 

498 



aaccctggaa attttgttga tagacgggct 60 

gactatcata aagaacttga gcatattgaa 12 0 

aagaaatatg ggagagcaat gcgcattggg 18 0 

atgagctact atggagactc gcctagggga 24 0 

atatgccgaa agttagacta tcacaatttt 300 

atcatggttc aggcataccg cttacttgtg 360 

ccattacact tgggtgttac tgaagctgga 42 0 

440 



ccaagatatt gctgaacttc aacotgcatc 60 

aaagtattgt gaatcattac acaaaactgt 12 0 

taacgtggct attggtagcg agcatcctat 18 0 

taaggatgtt gctgggacag ttgaacaggt 24 0 

tgtacggata acagttcaag ggaagaaaga 30 0 

ccttgttcaa aaaaattaca aca 353 



gcattcagct cactgatttc aattacaaag 60 
tgtaaagagc ctctgaattt tgatcacacc 12 0 
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acacccttct 


tctcatctcc 


accagaaatg 


gctaccggag 


ctgctgtgcc 


aactacgttt 


180 


tctaccctca 


agacatggga 


ttccagtttg 


gggtttgcaa 


aaaacataga 


ttttgtgaga 


240 


gtttccgata 


tgaagagcat 


gaaatcttct 


gcgaggaaaa 


gggtgtcaat 


tatcaggaac 


300 


tcaaatcctg 


gccaagatat 


tgctgaactt 


caacctgcat 


ccccaggaag 


ccctcttttg 


360 


gttcctaggc 


aaaagtattg 


tgaatcattg 


cacaaaccca 


tcaggagaaa 


aacaagcaca 


420 


gtaatggttg 


gtaacgtggc 


tattggtagc 


gagcatccta 


taagaattca 


gaccatgact 


480 


acaactgaca 


ctaaggatgt 


tgctgggaca 


gttgaaccgg 


tgatgagaat 


agcagataaa 


54 0 


ggagctgata 


ttgtacggat 


aacagttcaa 


gggaaga 






577 


<210> 
<211> 
<212> 
<213> 


41 

551 

DNA 

Glycine max 










<400> 


41 












tggtgctggt 


tctgatgctg 


gagcccttct 


ggtggatggg 


cttggagatg gacttctttt 


60 


ggaagcgcca 


gacaaggatt 


ttgaatttat 


tagaaacact 


tctttcaatt 


fcgttgcaagg 


12 0 


ctgcagaatg 


agaaatacaa 


agacagagta 


tgtct catgt 


ccatcctgtg 


gcagaacatt 


180 


gtttgatctt 


caagaagtaa 


gtgcacaaat 


tcgggagaag 


acatcacacc 


tccccggtgt 


240 


ttcgattgca 


atcatgggat 


gcattgtaaa 


tggaccaggg 


gagatggctg 


atgcagactt 


3 00 


tgggtatgtg 


ggaggcactc 


ccgggaagat 


tgacctctat 


gttgggaaga 


ctgtggtgaa 


360 


gcgtggaatt 


gcaatggagc 


atgcaaccaa 


tgccttgatc 


gatctaataa 


aagaacatgg 


420 


acgatgggtg 


gaccctcctg 


ccgaggagta 


aaagcaagag 


cttaattttg 


agattggcat 


480 


tcaaggccat 


agtaagatga 


gcattgtcat 


atccaattat 


tggacacatg 


taatataagc 


540 


atacactcaa 


t 










551 


<210> 
<211>. 
<212> 
<213> 


42 

869 

DNA 

Glycine max 


*. 








<400> 


42 












gaagcatagfc 


agcatcaatg 


ccttccttat 


acagaagact 


aaaattagca 


gagtgcatgc 


60 


ggccaggcgg 


ttatttgagt 


acctatccga 


caattctcta 


aacttccctg 


ttattcacca 


12 0 


2 h na rrt - i~ f* 
LaL LL ay L l_v_ 


ccaaatggga 


ttcacagaga 


ugac l. ugg u a 


attggtgctg 


gttctgatgc 


ion 

JLo U 


tggagccctt 


ctggtggatg 


ggcttggaga 


tggacttctt 


ttggaagcgc 


cagacaagga 


240 


ttttgaattt 


attagaaaca 


cttctttcaa 


tttgttgcaa 


ggctgcagaa 


tgagaaatac 


300 


aaagacagag 


tatgtctcat 


gtccatcctg 


tggcagaaca 


ttgtttgatc 


ttcaagaagt 


360 
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aagtgcacaa 


attcgggaga 


agacatcaca 


cctccctggt 


gtttcgattg 


caatcatggg 


420 


atgcattgta 


aatggaccag 


9ggagatggc 


tgatgcagac 


tttgggtatg 


tgggaggcac 


480 


tcccgggaag 


attgacctct 


atgttgggaa 


gactgtggtg 


aagcgtggaa 


ttgcaatgga 


540 


gcatgcaacc 


aatgccttga 


tcgatctaat 


aaaagaacat 


ggacgatggg 


tggaccctcc 


600 


tgccgaggag 


taaaagcaag 


agcttaattt 


tgagattggc 


attcaaggcc 


atagtaagat 


660 


yaLjLd. L. Ly <— L. 


dL.cLL.l_0 cLd L L. 


4— 4 — /^r+^ ^ rt !55i — \ 

aLLyLaCaCa 




gacaacacuc 


aatgcccaag 


ion 


tttgagccta 


gttttaagtt 


ccttttgaga 


aagatcccaa 


ttaaagcttg 


ttgtgaggaa 


780 


atcgacagct 


agaacatgta 


tacagataac 


agtgtattgc 


tttgccccat 


cagccatcaa 


840 


taataatgag 


aatctcttag 


aatagtgcc 








869 



<210> 43 

<211> 291 

<212> DNA 

<213> Glycine max 

<220> 

<221> unsure 

<222> (1. .291) 

<223> unsure at all n locations 

<400> 43 



gangnactca 


aatcctgggc 


caagatattg 


ctgaacttca 


nccctgcatc 


cccaggnngc 


60 


cctcttttgg 


ttcctaggca 


aaagtattgt 


gaatcattnc 


cacaaaactg 


nccagganaa 


120 


aaacaaacac 


agtgatggtt 


ggtaacgtgg 


ctattggtag 


cgagcatcct 


ataagaattc 


180 


agaccatgac 


tacgacngac 


actaaggatg 


ttgctgggac 


agtngaacng 


gtgatgagaa 


240 


tagcagataa 


aggagctgat 


attgtacgga 


taacagttca 


agggaagaaa 


g 


291 


<210> 
<211> 
<212> 
<213> 


44 

388 

DNA 

Glycine max 










<400> 


44 


• 










cccggtatat 


ggttcaggca 


taccgtttac 


ttgtggctga 


aatgtatgtc 


caaggctggg 


60 


attatccatt 


acacttgggt 


gttactgaag 


ctggagaagg 


tgaggatggg 


aggatgaagt 


120 


ctgcaattgg 


cattggaact 


cttcttcagg 


atggattggg 


tgatacaatt 


agggtttctc 


180 


tcacagaacc 


accagaagag 


gagatagatc 


cttgcagaag 


gttggcaaat" 


cttggaatga 


240 


gagcttctga 


actccagaag 


ggggtggaac 


cttttgaaga 


aaagcacaga 


cattattttg 

m 


300 


acttccagcg 


ccgatctggt 


caattgccag 


tgcaaaaaga 


gggtgaggag 


gtggattaca 


360 


gaggtgcact 


gcaccgtgac 


ggttctgt 








388 
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<210> 45 

<211> 211 

<212> DNA 

<213> Glycine max 

<400> 45 



cccggttatc 


atggcgcagg 


cataccgctt 


acttgtggct 


gaaatgtatg 


tccaaggctg 


60 


ggattatcca 


ttacacttgg 


gtgttactga 


agctggagga 


ggtgaggatg 


acaggatgaa 


12 0 


gtctgcaatt 


ggcattggaa 


ctcttcttca 


ggatggattg 


ggtgatacaa 


ttagggtgtc 


18 0 


tcgcacagaa 


ccaccagaag 


aggagataga 


t 






211 


<210> 
<211> 
<212> 

< A J.o > 


46 

276 

DNA 

Glycine max 










<400> 


46 












tgggcttgga 


gatggactac 


ttttggaagc 


cccggacaag 


gattttgaat 


ttattagaaa 


60 


cacttctttc 


aatttgttgc 


aaggctgcag 


aatgagaaat 


acaaagacag 


agtatgtctc 


12 0 


atgtccatcc 


tgtggcagaa 


catfcgtttga 


tcttcaagaa 


gtaagtgcac 


aaattcggga 


180 


gaagacatca 


cacctccctg 


gtgtttcgat 


tgcaatcatg 


ggatgcattg 


taaatggacc 


240 


a 9gggagatg 


gctgatgcag 


actttgggta 


tgtggg 






276 


<210> 
<211> 
<212> 
<213> 


47 
3 99 
DNA 

Brassica napus 










<400> 


47 












cccacgcgtc 


cgcagggatt 


cacagggacg 


agt tggtgat 


ccacgcaggg 


acatacgctg 


60 


gggcacttct 


agtggatgga 


cttggagatg 


gfcgtaatgct 


agaagcacct 


gatcaagact 


12 0 


tcgagtttct 


taggaacact 


tctttcaact 


tgttacaagg 


ctgcaggatg 


cgtaacacca 


180 


agacggaata 


cgtatcgtgc 


ccgtcttgtg 


gaagaactct 


gttcgacttg 


caagaaatca 


240 


gcgctgagat 


cagagaaaag 


acttcgcatt 


tgcctggcgt 


ttcgattgca 


ataatgggtt 


300 


gcattgtgaa 


tggacctggc 


gaaatggctg 


atgctgattt 


cggttatgta 


ggcggttctc 


360 


ccgggaaaat 


cgacctttac 


gttggaaaga 


cggtggtca 






399 
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<210> 
<211> 
<2 12 > 
<213> 



Arabidopsis thaliana 



48 

740 

PRT 



<400> 



48 



Met 
1 

Asp 

Asp 

Asn 

Gly 
65 

Lys 

Leu 

Thr 

Lys 

Asp 
14 5 

lie 

Val 

Asp 

Gin 

Lys 
225 

Ser 
Met 
Tyr 
Val 



Ala Thr Gly Val Leu Pro Ala Pro Val Ser Gly lie Lys lie Pro 

5 10 15 

Ser Lys Val Gly Phe Gly Lys Ser Met Asn Leu Val Arg lie Cys 
20 25 30 

Val Arg Ser Leu Arg Ser Ala Arg Arg Arg Val Ser Val lie Arg 
35 40 45 

Ser Asn Gin Gly Ser Asp Leu Ala Glu Leu Gin Pro Ala Ser Glu 
50 55 60 

Ser Pro Leu Leu Val Pro Arg Gin Lys Tyr Cys Glu Ser Leu His 

70 75- 80 

Thr Val Arg Arg Lys Thr Arg Thr Val Met Val Gly Asn Val Ala 

85 90 9 5 

Gly Ser Glu His Pro lie Arg He Gin Thr Met Thr Thr Ser Asp 
100 105 110 

Lys Asp He Thr Gly Thr Val Asp Glu Val Met Arg He Ala Asp 
115 120 125 

Gly Ala Asp He Val Arg He Thr Val Gin Gly Lys Lys Glu Ala 
130 135 140 

Ala Cys Phe Glu He Lys Asp Lys Leu Val Gin Leu Asn Tyr Asn 

150 155 160 

Pro Leu Val Ala Asp He His Phe Ala Pro Thr Val Ala Leu Arg 

165 170 175 

Ala Glu Cys Phe Asp Lys He Arg Val Asn Pro Gly Asn Phe Ala 



Arg Arg Ala Gin Phe Glu Thr He Asp Tyr Thr Glu Asp Glu Tyr 
195 200 205 

Lys Glu Leu Gin His He Glu Gin Val Phe Thr Pro Leu Val Glu 

210 215 220 

Cys Lys Lys Tyr Gly Arg Ala Met Arg He Gly Thr Asn His Gly 

230 235 240 

Leu Ser Asp Arg He Met Ser Tyr Tyr Gly Asp Ser Pro Arg Gly 

245 250 255 

Val Glu Ser Ala Phe Glu Phe Ala Arg lie Cys Arg Lys Leu Asp 
260 265 270 

His Asn Phe Val Phe Ser Met Lys Ala Ser Asn Pro Val He Met 
275 280 285 

Gin Ala Tyr Arg Leu Leu Val Ala Glu Met Tyr Val His Gly Trp 
290 295 300 
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Asp Tyr Pro Leu His Leu Gly Val Thr Glu Ala Gly Glu Gly Glu Asp 
305 310 315 320 

Gly Arg Met Lys Ser Ala He Gly He Gly Thr Leu Leu Gin Asp Gly 

325 330 335 

Leu Gly Asp Thr He Arg Val Ser Leu Thr Glu Pro Pro Glu Glu Glu 

340 345 350 

He Asp Pro Cys Arg Arg Leu Ala Asn Leu Gly Thr Lys Ala Ala Lys 
355 360 365 

Leu Gin Gin Gly Ala Pro Phe Glu Glu Lys His Arg His Tyr Phe Asp 
370 375 380 

Phe Gin Arg Arg Thr Gly Asp Leu Pro Val Gin Lys Glu Gly Glu Glu 
385 390 395 400 

Val Asp Tyr Arg Asn Val Leu His Arg Asp Gly Ser Val Leu Met Ser 

405 410 415 

He Ser Leu Asp Gin Leu Lys Ala Pro Glu Leu Leu Tyr Arg Ser Leu 

420 425 430 

Ala Thr Lys Leu Val Val Gly Met Pro Phe Lys Asp Leu Ala Thr Val 
435 440 445 

Asp Ser He Leu Leu Arg Glu Leu Pro Pro Val Asp Asp Gin Val Ala 
450 455 460 

Arg Leu Ala Leu Lys Arg Leu He Asp Val Ser Met Gly Val He Ala 
465 470 475 480 

Pro Leu Ser Glu Gin Leu Thr Lys Pro Leu Pro Asn Ala Met Val Leu 

485 490 495 

Val Asn Leu Lys Glu Leu Ser Gly Gly Ala Tyr Lys Leu Leu Pro Glu 

500 505 510 

Gly Thr Arg Leu Val Val Ser Leu Arg Gly Asp Glu Pro Tyr Glu Glu 
515 520 525 

Leu Glu He Leu Lys Asn He Asp Ala Thr Met He Leu His Asp Val 
530 535 540 

Pro Phe Thr Glu Asp Lys Val Ser Arg Val His Ala Ala Arg Arg Leu 
545 550 555 560 

Phe Glu Phe Leu Ser Glu Asn Ser Val Asn Phe Pro Val He His His 

565 570 575 

He Asn Phe Pro Thr Gly He His Arg Asp Glu Leu Val .He His Ala 

580 585 590 

Gly Thr Tyr Ala Gly Gly Leu Leu Val Asp Gly Leu Gly Asp Gly Val 
" 595 600 ' 605 

Met Leu Glu Ala Pro Asp Gin Asp Phe Asp Phe Leu Arg Asn Thr Ser 
610 615 620 

Phe Asn Leu Leu Gin Gly Cys Arg Met Arg Asn Thr Lys Thr Glu Tyr 
625 63 0 635 64 0 
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Val Ser Cys Pro 



Ser Ala Glu lie 

660 

Ala lie Met Gly 
675 

Asp Phe Gly Tyr 
690 

Gly Lys Thr Val 
705 

Ala Leu lie Gly 



Val Ala Asp Glu 

74 0 



Ser Cys Gly Arg 
645 

Arg Glu Lys Thr 



Cys He Val Asn 

680 

Val Gly Gly Ser 
695 

Val Lys Arg Gly 
710 

Leu He Lys Glu 
725 



Thr Leu Phe Asp 
650 

Ser His Leu Pro 
665 

Gly Pro Gly Glu 



Pro Gly Lys He 

700 

He Ala Met Thr 
715 

His Gly Arg Trp 
73 0 



Leu Gin Glu He 
655 

Gly Val Ser He 
670 

Met Ala Asp Ala 
685 

Asp Leu Tyr Val 



Glu Ala Thr Asp 

720 

Val Asp Pro Pro 
735 



<210> 49 

<211> 603 

<212> PRT 

<213> Oryza sativa 

<400> 49 

Met Val Gly Asn Val Pro Leu Gly Ser Asp His Pro He Arg He Gin 
15 10 15 

Thr Met Thr Thr Ser Asp Thr Lys Asp Val Ala Lys Thr Val Glu Glu 

20 25 30 

Val Met Arg He Ala Asp Lys Gly Ala Asp Phe Val Arg He Thr Val 
35 40 45 

Gin Gly Arg Lys Glu Ala Asp Ala Cys Phe Glu He Lys Asn Thr Leu 
50 55 60 

Val Gin Lys Asn Tyr Asn He Pro Leu Val Ala Asp He His Phe Ala 
65 70 75 80 

Pro Thr Val Ala Leu Arg Val Ala Glu Cys Phe Asp Lys Tie Arg Val 

85 90 95 

Asn Pro Gly Asn Phe Ala Asp Arg Arg Ala Gin Phe Glu Gin Leu Glu 

100 105 110 

Tyr Thr Glu Asp Asp Tyr Gin Lys Glu Leu Glu His He Glu Lys Val 
115 120 125 

Pro Asn He Ser Leu Phe Ser Val Asn Leu Val Phe Ser Pro Leu Val 
130 135 140 

Glu Lys Cys Lys Gin Tyr Gly Arg Ala Met Arg He Gly Thr Asn His 
145 150 155 160 

Gly Ser Leu Ser Asp Arg He Met Ser Tyr Tyr Gly Asp Ser Pro Arg 

165 170 175 

Gly Met Val Glu Ser Ala Leu Glu Phe Ala Arg He Cys Arg Lys Leu 

180 185 190 



49 



Asp Phe His Asn 
195 

Met Val Gin Ala 
210 

Trp Asp Tyr Pro 
225 

Asp Gly Arg Met 



Gly Leu Gly Asp 

260 

Glu lie Asp Pro 
275 

Asp Leu Gin lie 
290 

Phe Asp Phe Gin 
305 

Pro Glu Leu Leu 



Phe Val Phe Ser 

200 

Tyr Arg Leu Leu 

215 

Leu His Leu Gly 
230 

Lys Ser Ala lie 
245 

Thr lie Arg Val 



Cys Arg Arg Leu 

280 

Gly Val Ala Pro 
295 

Arg Arg Ser Gly 
310 

Tyr Arg Ser Leu 
325 



Met Lys Ala Ser 



Val Ala Glu Met 

220 

Val Thr Glu Ala 
235 

Gly lie Gly Thr 
250 

Ser Leu Thr Glu 
265 

Ala Asn Leu Gly 



Phe Glu Glu Lys 

300 

Gin Leu Pro Leu 
315 

Ala Ala Lys Leu 
330 



Asn Pro Val lie 
205 

Tyr Asn Leu Gly 



Gly Glu Gly Glu 

240 

Leu Leu Met Asp 
255 

Pro Pro Glu Glu 
270 

Thr His Ala Ala 
285 

His Arg Arg Tyr 



Gin Lys Glu Ala 

320 

Val Val Gly Met 

335 



Pro Phe Lys Asp 

340 

Pro Pro Val Glu 
355 

Asp lie Ser Met 
370 

Pro Leu Pro His 
385 

Gly Ala His Lys 



Arg Gly Asp Glu 

420 

Asp lie Thr Met 
435 

Gly Arg Val His 
450 

Gly Leu Asn Phe 
465 

Asn Arg Asp Asp 



Leu Val Asp Gly 

500 

Glu Phe Glu Phe 
515 



Leu Ala Thr Val 



Asp Ala Gin Ala 

360 

Gly Val Leu Thr 
375 

Ala lie Ala Leu 
390 

Leu Leu Pro Glu 
405 

Ser Tyr Glu Gin 



Leu Leu His Ser 

440 

Ala Ala Arg Arg 
455 

Pro Val He His 
470 

Leu Val lie Gly 
485 

Leu Gly Asp Gly 



Leu Arg Asp Thr 

520 



Asp Ser He Leu 
345 

Arg Leu Ala Leu 



Pro Leu Ser Glu 

380 

Val Asn Val Asp 
395 

Gly Thr Arg Leu 
410 

Leu Asp Leu Leu 
425 

Val Pro Tyr Gly 



Leu Phe Glu Tyr 

460 

His He Glu Phe 
475 

Ala Gly Ala Asn 
490 

Val Leu Leu Glu 
505 

Ser Phe Asn Leu 



Leu Lys Glu Leu 
350 

Lys Arg Leu Val 
365 

Gin Leu Thr Lys 



Glu Leu Ser Ser 

400 

Ala Val Thr Leu 
415 

Lys Gly Val Asp 
430 

Glu Glu Lys Thr 
445 

Leu Glu Thr Asn 



Pro Lys Ser Val 

480 

Val Gly Ala Leu 
495 

Ala Ala Asp Gin 
510 

Leu Gin Gly Cys 
525 
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Arg Met Arg Asn 
530 

Gly Pro Gly Glu 
545 

Pro Gly Lys lie 



lie Ala Met Glu 

580 

His Gly Arg Trp 
595 



Thr Lys Thr lie 



Met Ala Asp Ala 
550 

Asp Leu Tyr Val 
565 

Gly Ala Thr Asp 



Val Asp Pro Pro 

600 



Ala lie Met Gly 

540 

Asp Phe Gly Tyr 
555 

Gly Lys Thr Val 
570 

Ala Leu lie Gin 
585 

Val Glu Glu 



Cys lie Val Asn 



Val Gly Gly Ala 

560 

» 

Val Gin Arg Gly 
575 

Leu He Lys Asp 
590 



<210> 50 

<211> 372 

<212> PRT 

<213> Escherichia coli 

<400> 50 

Met His Asn Gin Ala Pro He Gin Arg Arg Lys Ser Thr Arg lie Tyr 
1 5 10 15 

Val Gly Asn Val Pro He Gly Asp Gly Ala Pro He Ala Val Gin Ser 

20 25 30 

Met Thr Asn Thr Arg Thr Thr Asp Val Glu Ala Thr Val Asn Gin He 
35 40 45 

Lys Ala Leu Glu Arg Val Gly Ala Asp He Val Arg Val Ser Val Pro 
50 55 , 60 

Thr Met Asp Ala Ala Glu Ala Phe Lys Leu He Lys Gin Gin Val Asn 
65 70 75 80 

Val Pro Leu Val Ala Asp He His Phe Asp Tyr Arg He Ala Leu Lys 

85 90 95 

Val Ala Glu Tyr Gly Val Asp Cys Leu Arg He Asn Pro Gly Asn lie 

100 105 110 

Gly Asn Glu Glu Arg He Arg Met Val Val Asp Cys Ala Arg Asp Lys 
115 120 125 

^ Asn He Pro He Arg He Gly Val Asn Ala Gly Ser Leu Glu Lys Asp 
130 135 140 

Leu Gin Glu Lys Tyr Gly Glu Pro Thr Pro Gin Ala Leu Leu Glu Ser 
145 150 155 160 

Ala Met Arg His Val Asp His Leu Asp Arg Leu Asn Phe Asp Gin Phe 

165 170 175 

Lys Val Ser Val Lys Ala Ser Asp Val Phe Leu Ala Val Glu Ser Tyr 

180 185 190 

Arg Leu Leu Ala Lys Gin He Asp Gin Pro Leu His Leu Gly He Thr 
195 200 205 

Glu Ala Gly Gly Ala Arg Ser Gly Ala Val Lys Ser Ala He Gly Leu 
210 215 220 
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Gly Leu Leu Leu Ser Glu Gly lie 
225 230 

Ala Ala Asp Pro val Glu Glu He 

245 

Ser Leu Arg He Arg Ser Arg Gly 

260 

Cys Ser Arg Gin Glu Phe Asp Val 
275 280 

Gin Arg Leu Glu Asp He He Thr 
290 295 

Cys Val Val Asn Gly Pro Gly Glu 
305 310 

Thr Gly Gly Asn Lys Lys Ser Gly 

325 

Asp Arg Leu Asp Asn Asn Asp Met 

340 

Arg Ala Lys Ala Ser Gin Leu Asp 
355 360 

Gin Val Glu Lys 
370 

<210> 51 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Designed primer named CINCO 
<400> 51 

cgctgcccag aatggacctc cctag 25 

<210> 52 
<211> 26 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Designed primer named SEIS 
<400> 52 



Gly Asp Thr Leu 
235 

Lys Val Gly Phe 
250 

He Asn Phe He 
265 

He Gly Thr Val 



Pro Met Asp Val 

300 

Ala Leu Val Ser 
315 

Leu Tyr Glu Asp 
330 

He Asp Gin Leu 
345 

Glu Ala Arg Arg 



Arg Val Ser Leu 

240 

Asp He Leu Lys 
255 

Ala Cys Pro Thr 
270 

Asn Ala Leu Glu 
285 

Ser He He Gly 



Thr Leu Gly Val 

320 

Gly Val Arg Lys 
335 

Glu Ala Arg He 
350 

He Asp Val Gin 
365 



cagccgcgtt ttgacttgaa acgtgc 2 6 
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<210> 53 
<211> 27 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Designed primer named MPD-Nde5 • 
<400> 53 

gccatatgac cgtttacaca gcatccg 27 

<210> 54 
<211> 35 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Designed primer named MPD-Eco3 1 
<400> 54 

tcgaattctc attattcctt tggtagacca gtctt 35 

<210> 55 
<211> 30 
<212> DNA 

<213> Artificial Sequence 
<220> 

<2 23> Designed primer named hPMKl 
<400> 55 

tggttaacat atggccccgc tgggaggcgc 3 0 

<210> 56 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Designed primer named hPMK4 
<400> 56 

aggttaactc aattaaagtc tggagcggat aaattctatc 4 0 

<210> 57 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<2 23> Designed primer named UNO 
<400> 57 

cgggcctcgt ttggctgtcg cactg 25 
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<210> 58 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Designed primer named DOS 
<400> 53 

cgcgggtgga aggaccttgt ggagg 

<210> 59 
<211> 33 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Designed primer named MK-Hpa5 1 
<400> 59 

aagttaacat atgtcattac cgttcttaac ttc 

<210> 60 
<211> 34 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Designed primer named MK-Hpa3 ' 
<400> 60 

cggttaactc attatgaagt ccatggtaaa ttcg 

<210> 61 
<211> 30 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Designed. primer named idiSX 
<400> 61 

cccctcgaga ttatgcaaac ggaacacgtc 

<210> 62 
<211> 31 
<212> DNA 

<213> Artificial Sequence 
<220> 

<2 23> Designed primer named idi3X 
<400> 62 

ggctcgagtt atttaagctg ggtaaatgca g 



PCT7US0 1/24335 



25 



33 



34 

/ 



30 
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<210> 63 
<211> 32 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Designed primer named pBAD-mut 1 
<400> 63 

ctgagagtgc accatctgcg gtgtgaaata cc 3 2 

<210> 64 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<2 23> Designed primer named pBAD-Linkl 
<400> 64 

aattctaagg aggtttaaac taaggaggta cgtaaggagg 4 0 

<210> 65 
<211> 40 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Designed primer named pBAD-Link2 
<400> 65 

tcgacctcct tacgtacctc cttagtttaa acctccttag 40 

<210> 66 
<211> 21 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Designed primer named pBAD-D2 
<400> 66 

tcatactccc gccattcaga g 21 

<210> 67 
<211> 21 
<212> DNA 

<213> Artificial Sequence 
<220> 

<2 23> Designed primer named pBAD-U3 
<400> 67 

ccgccaaaac agccaagctt g 



21 
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<210> 68 
<211> 28 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Designed primer named pRS-Ll 
<400> 68 

gatccgttta aacgcccggg cggccgcg 2 8 

<210> 69 
<211> 28 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Designed primer named pRS-D2 
<400> 69 

aattcgcggc cgcccgggcg tttaaacg 28 

<210> 70 
<211> 22 
<212> DNA 

<213> Artificial Sequence 
<220> 

<2 2 3 > Designed primer named 1PE 
<400> 70 

cgcggtgtgg gtgagcatga tg 2 2 

<210> 71 
<211> 30 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Designed primer named 22PE 
<400> 71 

aaatctcccg ggttacccgt ctgttactgc 30 

<210> 72 
<211> 33 
<212> DNA 

<213> Artificial Sequence 
<220> 

<2 23> Designed primer named 3PE 
<400> 72 

gcgtttaaac tggacgaagc gcgtcgaatt gac 33 
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<210> 73 
<211> 22 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Designed primer named 4PE 
<400> 73 

tgcacgaccg cccagttgtt cc 22 

<210> 74 
<211> 21 
<212> DNA 

<213> Artificial Sequence 

<220> 

<2 23> Designed primer named CAT1 
<400> 74 

gagtccgaat aaatacctgt g 21 

<210> 75 
<211> 21 
<212> DNA 

<213> Artificial Sequence 
<220> 

<2 23> Designed primer named CAT4 
<400> 75 

ccgaatttct gccattcatc c 21 

<210> 76 
<211> 21 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Designed primer named OPE 
<400> IS 

tgggctttgt cacgagcaca c 21 

<210> 77 
<211> 21 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Designed primer named 5PE 
<400> 77 

ggcccatagc aaaaccgaca g 21 
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<210> 78 . 
<211> 372 
<212> PRT 

<213> Escherichia coli 
<400> 78 

Met His Asn Gin Ala Pro lie Gin Arg Arg Lys Ser Thr Arg lie Tyr 
15 10 15 

Val Gly Asn Val Pro lie Gly Asp Gly Ala Pro lie Ala Val Gin Ser 

20 25 30 

Met Thr Asn Thr Arg Thr Thr Asp Val Glu Ala Thr Val Asn Gin lie 
35 40 45 

Lys Ala Leu Glu Arg Val Gly Ala Asp lie Val Arg Val Ser Val Pro 
50 55 60 

Thr Met Asp Ala Ala Glu Ala Phe Lys Leu lie Lys Gin Gin Val Asn 
65 70 75 80 

Val Pro Leu Val Ala Asp lie His Phe Asp Tyr Arg lie Ala Leu Lys 

85 90 95 

Val Ala Glu Tyr Gly Val Asp Cys Leu Arg He Asn Pro Gly Asn He 

100 105 110 

Gly Asn Glu Glu Arg He Arg Met Val Val Asp Cys Ala Arg Asp Lys 
115 120 125 

Asn He Pro lie Arg He Gly Val Asn Ala Gly Ser Leu Glu Lys Asp 
130 135 140 

Leu Gin Glu Lys Tyr Gly Glu Pro Thr Pro Gin Ala Leu Leu Glu Ser 
145 150 155 160 

Ala Met Arg His Val Asp His Leu Asp Arg Leu Asn Phe Asp Gin Phe 

165 170 175 

Lys Val Ser Val Lys Ala Ser Asp Val Phe Leu Ala Val Glu Ser Tyr 

180 185 190 

Arg Leu Leu Ala Lys Gin He Asp Gin Pro Leu His Leu Gly He Thr 
195 200 205 

Glu Ala Gly Gly Ala Arg Ser Gly Ala Val Lys Ser Ala He Gly Leu 
210 215 220 

Gly Leu Leu Leu Ser Glu Gly He Gly Asp Thr Leu Arg Val Ser Leu 
225 230 235 240 

Ala Ala Asp Pro Val Glu Glu He Lys Val Gly Phe Asp He Leu Lys 

245 250 255 

Ser Leu Arg He Arg Ser Arg" Gly lie Asn Phe lie Ala Cys Pro' Thr 

260 265 270 

Cys Ser Arg Gin Glu Phe Asp Val He Gly Thr Val Asn Ala Leu Glu 
275 280 285 

Gin Arg Leu Glu Asp He He Thr Pro Met Asp Val Ser He He Gly 
290 295 300 
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Cys Val Val Asn 
305 

Thr Gly Gly Asn 



Asp Arg Leu Asp 

340 

Arg Ala Lys Ala 
355 

Gin Val Glu Lys 
370 

<210> 79 
<211> 740 
<212> PRT 
<213> Arabidops 

<400> 79 

Met Ala Thr Gly 

1 

Asp Ser Lys Val 

20 

Asp Val Arg Ser 
35 

Asn Ser Asn Gin 
50 

Gly Ser Pro Leu 
65 

Lys Thr Val Arg 



Leu Gly Ser Glu 

100 

Thr Lys Asp lie 
115 

Lys Gly Ala Asp 
130 

Asp Ala Cys Phe 
145 

lie Pro Leu Val 



Val Ala Glu Cys 

180 

Asp Arg Arg Ala 
195 

Gin Lys Glu Leu 
210 



Gly Pro Gly Glu 
310 

Lys Lys Ser Gly 
325 

Asn Asn Asp Met 



Ser Gin Leu Asp 

360 



s thaliana 



Val Leu Pro Ala 
5 

Gly Phe Gly Lys 



Leu Arg Ser Ala 

40 

Gly Ser Asp Leu 
55 

Leu Val Pro Arg 
70 

Arg Lys Thr Arg 
85 

His Pro lie Arg 



Thr Gly Thr Val 

120 

lie Val Arg lie 

13 5 

Glu lie Lys Asp 
150 

Ala Asp lie His 
165 

Phe Asp Lys lie 



Gin Phe Glu Thr 

200 

Gin His lie Glu 
215 



Ala Leu Val Ser 
315 

Leu Tyr Glu Asp 
33 0 

lie Asp Gin Leu 
345 

Glu Ala Arg Arg 



Pro Val Ser Gly 
10 

Ser Met Asn Leu 
25 

Arg Arg Arg Val 



Ala Glu Leu Gin 

60 

Gin Lys Tyr Cys 
75 

Thr Val Met Val 
90 

lie Gin Thr Met 
105 

Asp Glu Val Met 



Thr Val Gin Gly 

140 

Lys Leu Val Gin 
155 

Phe Ala Pro Thr 
170 

Arg Val Asn Pro 
185 

He Asp Tyr Thr 



Gin Val Phe Thr 

220 



Thr Leu Gly Val 

320 

Gly Val Arg Lys 
335 

Glu Ala Arg He 
350 

He Asp Val Gin 
365 



He Lys He Pro 
15 

Val Arg He Cys 
30 

Ser Val He Arg 
45 

Pro Ala Ser Glu 



Glu Ser Leu His 

80 

Gly Asn Val Ala 
95 

Thr Thr Ser Asp 
110 

Arg He Ala Asp 
125 

Lys Lys Glu Ala 



Leu Asn Tyr Asn 

160 

Val Ala Leu Arg 
175 

Gly Asn Phe Ala 
190 

Glu Asp Glu Tyr 
205 

Pro Leu Val Glu 
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Lys Cys Lys Lys Tyr Gly Arg Ala Met Arg lie Gly Thr Asn His Gly 
225 230 235 240 

Ser Leu Ser Asp Arg He Met Ser Tyr Tyr Gly Asp Ser Pro Arg Gly 

245 250 255 

w 

Met Val Glu Ser Ala Phe Glu Phe Ala Arg He Cys Arg Lys Leu Asp 

260 265 270 

Tyr His Asn Phe Val Phe Ser Met Lys Ala Ser Asn Pro Val He Met 
275 280 285 

Val Gin Ala Tyr Arg Leu Leu Val Ala Glu Met Tyr Val His Gly Trp 
290 295 300 

Asp Tyr Pro Leu His Leu Gly Val Thr Glu Ala Gly Glu Gly Glu Asp 
305 310 315 320 

Gly Arg Met Lys Ser Ala He Gly He Gly Thr Leu Leu Gin Asp Gly 

325 330 335 

Leu Gly Asp Thr He Arg Val Ser Leu Thr Glu Pro Pro Glu Glu Glu 

340 345 35Q 

He Asp Pro Cys Arg Arg Leu Ala Asn Leu Gly Thr Lys Ala Ala Lys 
355 360 355 

Leu Gin Gin Gly Ala Pro Phe Glu Glu Lys His Arg His Tyr Phe Asp 
370 375 380 

Phe Gin Arg Arg Thr Gly Asp Leu Pro Val Gin Lys Glu Gly Glu Glu 
385 390 395 400 

Val Asp Tyr Arg Asn Val Leu His Arg Asp Gly Ser Val Leu Met Ser 

405 410 415 

He Ser Leu Asp Gin Leu Lys Ala Pro Glu Leu Leu Tyr Arg Ser Leu 

420 425 430 

. Ala Thr Lys Leu Val Val Gly Met Pro Phe Lys Asp Leu Ala Thr Val 
435j 440 445 

Asp Ser He Leu Leu Arg Glu Leu Pro Pro Val Asp Asp Gin Val Ala 
450 455 460 

Arg Leu Ala Leu Lys Arg Leu He Asp Val Ser Met Gly Val He Ala 
465 470 475 480 

Pro Leu Ser Glu Gin Leu Thr Lys Pro Leu Pro Asn Ala Met Val Leu 

485 490 495 

Val Asn Leu Lys Glu Leu Ser Gly Gly Ala Tyr Lys Leu Leu Pro Glu 

500 505 510 

Gly Thr Arg Leu Val Val Ser Leu Arg Gly Asp Glu Pro Tyr Glu Glu 
515 " '520 - - - - -- 525" 

Leu Glu He Leu Lys Asn lie Asp Ala Thr Met He Leu His Asp Val 
530 535 54Q 

Pro Phe Thr Glu Asp Lys Val Ser Arg Val His Ala Ala Arg Arg Leu 
545 550 555 ^ 560 
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Phe Glu Phe Leu 



lie Asn Phe Pro 

580 

Gly Thr Tyr Ala 
595 

Met Leu Glu Ala 

610 

Phe Asn Leu Leu 
625 

Val Ser Cys Pro 



Ser Ala Glu lie 

660 

Ala lie Met Gly 
675 

Asp Phe Gly Tyr 
690 

Gly Lys Thr Val 
705 

Ala Leu lie Gly 



Val Ala Asp Glu 

74 0 

<210> 80 
<211^"\*\5 
<212>~jS^ 
<213> Arabidopsis thaliana 

<400> 80 

aaaaatcgga aaaatggcga ctggagtatt gccagctccg gtttctggga tcaagatacc 60 

ggattcgaaa gtcgggtttg gtaaaagcat gaatcttgtg agaatttgtg atgttaggag 12 0 

tctaagatct gctgatgagt agatttcata aaagt 155 

<210> 81 
<211> 42 
<212> PRT 

<213> Arabidopsis thaliana 
<400> 81 

Met Ala Thr Gly Val Leu Pro Ala Pro Val Ser Gly lie Lys lie Pro 

1 ' ' 5 10 "15 

Asp Ser Lys Val Gly Phe Gly Lys Ser Met Asn Leu Val Arg lie Cys 

20 25 30 

Asp Val Arg Ser Leu Arg Ser Ala Asp Glu 
35 40 



Ser Glu 
565 

Thr Gly 

Gly Gly 

Pro Asp 

Gin Gly 
630 

Ser Cys 
645 

Arg Glu 

Cys lie 

Val Gly 

Val Lys 
710 

Leu lie 

725 



Asn Ser 

lie His 

Leu Leu 
600 

Gin Asp 
615 

Cys Arg 
Gly Arg 



Lys Thr 



Val Asn 
680 

Gly Ser 
695 

Arg Gly 



Lys Glu 



Val Asn 
570 

Arg Asp 
585 

Val Asp 

Phe Asp 

Met Arg 

Thr Leu 
650 

Ser His 
665 

Gly Pro 

Pro Gly 

lie Ala 

His Gly 
730 



Phe Pro 



Glu Leu 



Gly Leu 



Phe Leu 
620 

Asn Thr 
635 

Phe Asp 

Leu Pro 

Gly Glu 

Lys lie 
700 

Met Thr 
715 

Arg Trp 



Val lie 

Val lie 
590 

« 

Gly Asp 
605 

Arg Asn 

Lys Thr 

Leu Gin 

Gly Val 
670 

Met Ala 
685 

Asp Leu 
Glu Ala 
Val Asp 



His His 
575 

His Ala 

Gly Val 

Thr Ser 

Glu Tyr 
640 

Glu lie 
'655 

Ser lie 
Asp Ala * 



Tyr Val 



Thr Asp 
720 

Pro Pro 
735 
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<210> 82 
<211> 45 
<212> DNA 

<213> Arabidopsis thaliana 
<400> 82 

atgagaggat cgcaycayca ycaycaycay cayggatccg catgc 4 5 

<210> 83 
<211> 12 
<212> PRT 

<213> Arabidopsis thaliana 
<400> 83 

Met Arg Gly Ser His His His His His His Gly Ser 
1 5 10 

<210> 84 
<211> 59 
<212> DNA 

<213> Arabidopsis thaliana 
<400> 84 

atgagaggat cgcaycayca ycaycaycay ggatctgctg atgagtagat ttcgcatgc 59 

<210> 85 
<211> 15 
<212> PRT 

<213> Arabidopsis thaliana 
<400> 85 

Met Arg Gly Ser His His His His His His Gly Ser Ala Asp Glu 
1 5 10 15 
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